Physics-based Residual Kriging for dynamically evolving functional random fields

We present a novel approach named Physics-based Residual Kriging for the statistical prediction of spatially dependent functional data. It incorporates a physical model—expressed by a partial differential equation—within a Universal Kriging setting through a geostatistical modelization of the residuals with respect to the physical model. The approach is extended to deal with sequential problems, where samples of functional data become available along consecutive time intervals, in a context where the physical and stochastic processes generating them evolve, as time intervals succeed one another. An incremental modeling is used to account for both these dynamics and the misfit between previous predictions and actual observations. We apply Physics-based Residual Kriging to forecast production rates of wells operating in a mature reservoir according to a given drilling schedule. We evaluate the predictive errors of the method in two different case studies. The first deals with a single-phase reservoir where production is supported by fluid injection, while the second considers again a single-phase reservoir but the production is driven by rock compaction.


Introduction
Functional Data Analysis (FDA, Ramsay 2005) is an area of statistics developed in the recent years to deal with infinite-dimensional data, such as functions and curves. In many applications, functional data are also geo-referenced, i.e. observed at specific locations within the spatial domain of interest. In these cases, the spatial association among functional data has to be modeled. To deal with functional data with a spatial dependence, a number of geostatistical methods for modeling and prediction of scalar data (Cressie 2015) have been extended to functional data, leading to functional Ordinary Kriging and functional Universal Kriging predictors (Caballero et al. 2013;Menafoglio et al. 2013Menafoglio et al. , 2016. These techniques belong to the more general framework of Object Oriented Spatial Statistics (O2S2, Secchi 2017, 2019), which allows for the analysis of general types of object data (e.g., functional, distributional, or Riemannian data) by considering them as points of an appropriate mathematical space (e.g., Hilbert space or Riemannian manifold), through a geometrical and topological approach, see, for instance, Menafoglio et al. (2014).
In many processes over spatial domains, not only data are observed, but also prior information about the phenomenon under study is available in terms of a physical model, often formulated through Partial Differential Equations (PDEs). These physical models can be used to predict a functional response at any location of the domain, the spatio-temporal evolution of the phenomenon being precisely modeled through the governing physical laws, such as mass or energy conservation. Of course, one may expect a discrepancy between these predictions and the observations, due to many factors such as over-simplified model assumptions, uncertain model parameters, numerical discretization errors and aleatory variability (Quarteroni and Valli 1994;Smith 2013). Still, the physical model can provide meaningful information and, when taken into account, can potentially lead to better predictions than a pure data-driven model.
In the recent years, a number of authors presented approaches which exploit physical models to improve the modeling of spatial dependent data. A notable example is the so-called regression with differential regularization (Arnone et al. 2019;Azzimonti et al. 2015;Bernardi et al. 2017), which models spatially dependent data by means of a spatial regression penalized with a differential operator, which in turn expresses the physical information on the problem under study. In this case, the differential operator is used to support predictions in regions sparsely observed; the obtained predictions are typically biased towards the physical model solution. Another approach is illustrated in Chen and Baker (2019); Rodat et al. (2018); Yang et al. (2018Yang et al. ( , 2019, where the physical model is used to improve estimations of the spatial dependence among the (scalar) data, especially when few observations are available. In this case, the spatial covariance is estimated from multiple physical model realizations, thus avoiding the fit of a parametric model to empirical variograms or covariograms. Constantinescu and Anitescu (2013) propose a way to derive valid covariance models which incorporates physical relations between the variables in a multivariate setting. In this broad framework, none of the works presented in the literature specifically deals with problems where the response variables are functional objects, sequentially observed over time.
As a first element of innovation, we here propose an original approach, that we term Physics-based Residual Kriging (Phy-RK) for functional data, which incorporates the physical model for the phenomenon in a functional geostatistical setting. Having collected the observation of the process at a set of locations, we decouple the observed functional process in a physically-driven term and in a stochastic residual, and accordingly model the latter by means of functional Universal Kriging. We show that this method, beside being unbiased, can lead to substantial improvements, in terms of prediction error, when compared to the sole physical model or to the purely datadriven Kriging approach. Furthermore, we show that the information from the physical model can be effectively used also in the residuals modeling, through the inclusion of covariates derived from the physical model solution.
As a key additional element of novelty, we formulate the Phy-RK approach for sequential problems, namely for problems where sets of functional observations are sequentially observed along a number of time intervals. Here, the experimental design proceeds incrementally, including at each time interval a new set of observed locations, which are added to the locations observed at previous intervals. In this case, our methodology aims to address problems where (i) the sequential design itself implies a dynamic change in the process generating the data, and, consequently, (ii) repeated observations at the same locations are generated by differing random fields when collected at different time intervals. Due to the intrinsic dynamical nature of the problem, we develop an incremental formulation of the Phy-RK predictor which not only accounts for the residuals between the physics-based drift and the latest observations (first-order residuals), but also for the residuals between the previous predictors and the actual observations (higher-order residuals), allowing for a dynamic correction of Kriging predictions.
Starting from the work of Kennedy and O'Hagan (2001), a similar approach has been developed under the name of model calibration. In this framework, the parameters of a computational simulation (physical model) are estimated (calibrated), by jointly modeling its residual with a Gaussian random field. Both Bayesian (Kennedy and O'Hagan 2001) and frequentist (Tuo and Wu 2015) paradigms have been developed. In our Phy-RK approach, we adopt a different point of view, assuming that a physical model is already available, with parameters calibrated independently of the residual modelization. Furthermore, we here adopt the trace-Universal Kriging (Menafoglio et al. 2013) predictor for the residual modeling, avoiding a basis expansion approach for functional data, as in Bayarri et al. (2007);Higdon et al. (2008), which may result in very high-dimensional analyses when the basis dimension increases. Another element of novelty with respect to this literature stream relies on the sequential analysis of the prediction residuals, which is key to provide an effective modeling framework for evolving random fields. Indeed, our framework allows one to deal with problems where observations become available in a sequential manner, as in Wang et al. (2019), but, more importantly, it can also be used when it is precisely the random field generating the data that evolves along the sequential sampling of the data. The latter is the core problem which we illustrate in this work.
The Phy-RK predictor has a broad applicability whenever the phenomenon under study exhibits spatial dependence and prior information is available through a physical model, for instance in environmental sciences, and more specifically in meteorology and pollution modeling (Webster and Oliver 2007), and in public health problems (Diggle and Giorgi 2019). In this work, we embed the formulation of Phy-RK within the problem of predicting production rates, measured in STB/day, of producer wells operating in a mature conventional reservoir. Here, we aim at forecasting production rates of wells yet to be drilled at a set of locations in the reservoir, based on the observed past production of the wells which were active in previous time intervals. Production forecast is necessary to quantify investments, e.g. number and type of new wells, and to optimize field operations.
Unlike unconventional shale gas reservoirs, in which a pure geostatistical method has been successfully applied (Menafoglio et al. 2015; Xi and Morgan 2019), the application of such techniques for production forecast of wells yet to be drilled in a conventional reservoir has not been investigated so far. The main reason is that drilling a new well perturbs the pressure of the whole reservoir, changing the random field realization observed through the production rates of previously operating wells. Applying a pure geostatistical method, based on functional Kriging, would not appropriately model this dynamical nature of the process. For instance, if more than one wells were open in the successive time interval, a purely geostatistical approach would not take into account their interaction (e.g., if two wells were drilled close to each other, one would expect lower productions, due to their interaction, than placing a single well). This phenomenon is somehow negligible in an unconventional reservoir, as in this type of systems the perturbation happens at a local scale due to their peculiar rock properties. For these reasons, the standard approach to production forecast in conventional systems is the reservoir numerical simulation (Aziz and Settari 1979), where the reservoir domain is discretized and the governing equations of fluid flow in porous media are solved. Surrogate models can be developed allowing one to avoid setting many parameters, such as porosity and permeability of each cell of a fine reservoir discretization, and long execution times even on modern computational infrastructures. Some examples are the Capacitance Resistance Models (CRM, Wanderley de Holanda et al. 2018), the Interwell Numerical SImulation Model (INSIM, Zhao et al. 2015) and the recently developed FlowNet (Kiaerr et al. 2020). In this context, Phy-RK could exploit, in a data-driven framework, a surrogate model, i.e. a physical model which is characterized by fewer parameters and is less computationally demanding than a full-physics 3D reservoir simulation. Furthermore, our Phy-RK methodology overcomes the above-mentioned limitations due to the static nature of a pure geostatistical approach, as the wells drilling schedule is taken into full account by the physical model.
We present two applications of Phy-RK to the problem of forecasting production rates. Both concern single phase reservoirs; for the first one, production is supported by fluid injection, whereas, for the second one, by rock compaction. In both cases, wells are incrementally drilled in the reservoir according to a schedule which consists of eight time steps. At each step, the aim is to predict the production rates of the new wells drilled in that step as well as the future production rates of those already operating, based on the production data observed up to the previous step at the wells already operating. We shall show that the Phy-RK approach leads to substantially lower prediction errors than a pure geostatistical method, which will thus be shown to be inappropriate for the prediction of production rates in a conventional reservoir.
The work is organized as follows. Section 2 introduces the mathematical formulation of Phy-RK predictor, with a focus on its sequential formulation. In Sect. 3, we adapt the general approach of Phy-RK to the prediction of production rates in a mature reservoir. Section 4 presents a first application of Phy-RK for the prediction of liquid production rates in a single-phase reservoir supported by liquid injection. Section 5 presents a second application of Phy-RK for the prediction of liquid production rates supported by rock compaction. Finally, Sect. 5.3 outlines conclusions and considerations for future development and applications.
2 Physics-based Residual Kriging predictor for functional data

Formulation of physics-based Residual Kriging
Consider a probability space ðX; F ; PÞ and a (functional) random object Y : X ! H, where H is a separable Hilbert space, endowed with the norm k Á k induced by the scalar product hÁ; Ái. We define the random field fY s ; s 2 D R d g; where s indicates the spatial location, included in the spatial domain D, associated with the corresponding random function Y s . We assume that, for each s, Our goal is the prediction of the functional variable at location s 0 , having observed the random field at locations fs 1 ; . . .; s n g.
We suppose that the phenomenon under investigation can be described by means of a deterministic model, able to approximately predict Y s . We call these predictions F s ðhÞ, where h 2 H & R m are the model parameters, and we decompose the functional variables Y s as being X s the stochastic residual from the deterministic model prediction.
In this work, we focus our attention on physics-based models, mathematically formulated by means of Partial Differential Equations (PDEs), which express physical principles, such as mass or energy conservation, into the mathematical relationship between partial derivatives of functions (for an introduction see, e.g., Evans 2010; Salsa 2015). In particular, we formulate the physical model as the differential problem where uðhÞ is the solution, LðhÞ is a differential operator and gðhÞ is the forcing term, which all depend on the vector of parameters h. The determination of these parameters is out of the scope of this work, although our method can partially correct their misspecification (as discussed in Ssect. 4.4). We assume that problem (1), coupled with appropriate boundary conditions on the boundary of the domain D, oD, and, if time-dependent, with an initial condition, is well-posed. In particular, we require that problem (1) admits a unique weak solution uðhÞ 2 V for each h 2 H, where V is an appropriate Hilbert space. The physical model prediction is thus given by where F s : V ! H is a continuous bounded operator which maps the PDE solution into the physical model prediction at location s 2 D. Although we restrict ourselves to PDE models, this is not a limitation of our approach and more general models can be considered.
We combine the deterministic model (2) with a geostatistical model for the residual random field fX s ; s 2 D R d g: Indeed, the residual random field might exhibit spatial dependence, which results from the spatial structure that was already present in the random field Y s but not completely captured by the physical model. This can be due, for instance, to simplified assumptions of the physical model, to numerical discretization errors in the PDE solution or to the presence of additional influent variables not included in the physical model. We point out that, in the residual part, physics can still play a role, which however is modeled following a geostatistical approach.
In this work, we rely on O2S2, with particular reference to Universal Kriging (UK) for functional observations (Menafoglio et al. 2013). This geostatistical framework allows one to predict the (functional) residual at unsampled locations within the domain, by modeling the spatial dependence of the residual field, then building, on this basis, the Universal Kriging predictor. We thus define the expected value m s of the residual random field as the integral being defined in the Bochner sense, and the global covariance function (also called trace-covariogram) C : D Â D ! R, as Note that the covariance function C allows one to quantify the spatial dependence between two elements of the field corresponding to two locations in the spatial domain.
Recall that, in a functional setting (Menafoglio et al. 2013), a random field fX s ; s 2 D R d g is said to be globally second-order stationary if E½X s ¼ m for all s 2 D and the separating vector between two locations). Furthermore, a second-order sta- Although the isotropy assumption can be easily weakened, we stick to it for ease of notation. We assume that the random field fX s ; s 2 D R d g can be decomposed as the sum of a non-stationary mean and a stationary residual, i.e.
where m s is the mean term (a.k.a. drift) and d s is a zeromean, second-order stationary and isotropic residual random field. For the mean term, we assume a linear model where f 0 is the intercept (i.e, f 0 ðsÞ ¼ 1 for all s 2 D), f 1 ðsÞ; . . .; f L ðsÞ 2 R are known spatially dependent covariates and b 0 ðÁÞ; . . .; b L ðÁÞ are functional parameters in H, which do not depend on the spatial location. The model (3) could be generalized by considering a non-stationary residual random field d s . However, the estimation of the covariance function in this case would be impractical in presence of few observations. A possible solution is represented by the Physics-informed Kriging (Yang et al. 2018), where the covariance function is estimated from stochastic simulations of a physical model. In practice, having collected n observations fX s 1 ; . . .; X s n g of the residual random field at locations fs 1 ; . . .; s n g-obtained as differences between the observed fY s 1 ; . . .; Y s n g and the purely physics-based predictions fF s 1 ; . . .; F s n g -our goal is the prediction of the variable Y s 0 at the unsampled location s 0 in D. It is computed as the sum of the physical model prediction F s 0 and the residual X s 0 predicted by Universal Kriging, which is the best linear unbiased predictor, of the form: where the optimal weights ðk 1 ; . . .;k n Þ solve the optimization problem: Thus, the weights are chosen by minimizing the variance of the prediction error, subject to the (uniform) unbiasedness constraint. Under the assumption that the covariance function C is known, the optimization problem (6) can be equivalently formulated in terms of the following linear system where ðl 0 ; Á Á Á ; l L Þ are the Lagrange multipliers associated with the L linear constraints derived from the unbiasedness constraint in (6), see Menafoglio et al. (2013) for further details. Solving the linear system, we obtain the optimal weights and, thus, the corresponding predictionX s 0 . Eventually, we compute the Physics-based Residual Kriging (Phy-RK) prediction for the functional variable Y s 0 aŝ In applications, one usually does not know the trace-covariogram C and thus needs to estimate it from the observations. This is done by estimating the trace-semivariogram cðh i;j Þ ¼ Cð0Þ À Cðh i;j Þ, in two steps. Firstly, a method-of-moments estimator is used (i.e., a binned tracevariogram, see, e.g., Menafoglio et al. 2013), then a valid parametric variogram model (e.g., spherical, Matérn, see Cressie 2015) is fitted by optimizing its parameters in, e.g., a least-squares sense. This procedure is iterated until convergence, as one needs to compute the residuals d s having estimated the coefficients b 0 ; . . .; b L through Generalized Least Squares, as explained in Menafoglio et al. (2013). Finally, the trace-covariogram is easily obtained from the trace-semivariogram similarly as in classical geostatistics. Note that, in the setting of our work, the spatial dependence is estimated from the observations and not from multiple realizations of the physical model, as in Chen and Baker (2019); Rodat et al. (2018); Yang et al. (2018), because it is referred to the residuals from the physics model predictions (2). This is compatible with the simplified nature of our physical model (1), and allows us to reduce the overall computational burden of the procedure, as one avoids to repeatedly solve the PDE used to model the phenomenon under study.

Residual Kriging for sequential predictions
In this subsection, we extend the general Physics-based Residual Kriging method presented in Sect. 2.1 for tackling the problem of sequential prediction. In this context, let us consider the whole time domain ½t 0 ; t N a ¼ T and subdivide it into N a activation time intervals, indicating the subintervals with We assume that, in each interval a i , a realization of the functional-valued random field fY s;a i ; s 2 D R d g, which depends on the interval, has been sampled at a set of locations in the spatial domain D. From now on, we omit the apex i of the locations, implicitly assuming that they belong to the set W i specified by the activation interval a i . Furthermore, each random field is characterized by a covariogram C a i , which can vary among different intervals. This mathematical setting describes, for instance, the prediction of production rates in a conventional reservoir. Indeed, we observe the production rates of currently operating wells in the time interval a i , but when additional wells are drilled, the whole reservoir is perturbed, leading to a possibly different random field in the next interval a iþ1 .
Given a location s 0 sampled during the interval a i with i ! 2, our goal is to predict the corresponding functional observation, given the functional data sampled in the interval a iÀ1 . We embed the general Phy-RK predictor (8) in this setting, by defining the predictor which is obtained as the sum of the physics-based prediction F s 0 ;a i ðhÞ in the interval a i with wells configuration W i , and the residual predicted by functional Universal Kriging (5). In particular, the residual predictor has the following form where the optimal weightsk j;a i are found by solving problem (6) based on the information available at the latest step (i.e., X s j ;a jÀ1 ; s j 2 W iÀ1 ). Note that, in general, the intervals a i and a iÀ1 may have different lengths. This implies that, if interval a iÀ1 is longer than interval a i , the predictionX s 0 ;a i will be defined on the whole interval a i . If this is not the case, the predictionX s 0 ;a i will be defined only on the the initial part of the interval a i (i.e., on an interval domain whose length coincides with that of a iÀ1 ). Moreover, we point out that the residuals at the locations that were observed in a iÀ1 are predicted precisely as the observed residuals in the previous time interval, i.e., due to the interpolating property of Kriging. However, note that, in general, the Phy-RK predictionŶ s j ;a i will not coincide with the response Y s j ;a iÀ1 observed in the previous interval, as the physics-driven prediction F s 0 ;a i ðhÞ evolves along the intervals. In practice, to build predictor (10), we would need to estimate the covariogram C a i from the residuals of interval a i , which however are not available at the beginning of interval a i . Therefore, we employ the estimate obtained with residuals observed in interval a iÀ1 , implicitly assuming that the covariogram does not abruptly vary between consecutive intervals. In fact, the viability of using the predictive model (9) in a sequential framework itself depends on the stability of the process over timeboth in terms of the physics-based model and residuals field.

Physics-based Residual Kriging with sequential update
If the residuals among consecutive activation intervals exhibit a strong variation, due for instance to the strong evolutionary nature of the considered problem (as the one illustrated in Sect. 5), predictor (9) might be unable to adapt to the dynamic of the process itself. In this case, we may introduce additional terms, dynamically updated, to further correct the predictive model based on the misfit between observations and predictions in the previous activation interval. We illustrate the construction of such corrective terms in the first activation time intervals, the following intervals representing just a straightforward generalization. Predictor (9) can be used to produce predictions starting from the second interval a 2 , as one needs to observe the residuals in a 1 to be able to predict them in a 2 . If we now move to the interval a 3 , we can predict two residuals: (i) the residuals from the physical model X ð1Þ s j ;a 2 , and (ii) the residuals from the predictive model used to carry out the forecast at the previous step, i.e, the one formulated in (9). These latter residuals, denoted as X ð2Þ s j ;a 2 , are informative on the possible misfit of the latest available predictive model which may represent repeatable effects potentially observable in successive time instants. In fact, the secondorder residuals X ð2Þ s j ;a 2 can be used to update the predictions in the third activation interval, correcting them by the observed misfits in the second interval.
Thus, for third activation interval, we can build a corrected predictor where the two predicted residualsX ð1Þ s 0 ;a 3 andX ð2Þ s 0 ;a 3 are obtained through functional Kriging of the latest available observations of first-and second-order residuals, i.e., j;a 3 X ð1Þ s j ;a 2 ; j;a 3 X ð2Þ s j ;a 2 ; with X ð1Þ s j ;a 2 ¼ Y s j ;a 2 À F s j ;a 2 and X ð2Þ s j ;a 2 ¼ Y s j ;a 2 À F s j ;a 2 ÀX ð1Þ s j ;a 2 . In practice, one can decide how complex the predictive model should be (i.e., how many residuals to include): (i) no residuals (Phy-RK-0), i.e. the prediction is determined by the physical model only (the only possible predictive model at step a 1 ), (ii) one residual (Phy-RK-1), correcting the prediction of the physical model with first-order residuals (the complete predictive model at step a 2 ) or (iii) two residuals (Phy-RK-2), further correcting the predictive model in a 2 with second-order residuals. This set of choices is depicted in Fig. 1, where these possibilities at a generic location s 0 are shown. In the first interval, the only possible prediction is F s 0 ;a 1 , i.e., the physical model prediction; no residuals can be used (Phy-RK-0, green line). At the end of the first interval, having measured the functional observation in s 0 , Y s 0 ;a 1 , one can note, for instance, that the physical model actually overestimates the response in a 1 , producing a (positive) residual (represented as green arrows). On this basis, one can choose whether to correct the predictive model based on the physics by including the first order residuals. This would yield a corrected prediction (Phy-RK-1, blue line). In the third interval, one can choose between the physical model (Phy-RK-0, green line), one residual (Phy-RK-1, blue line) and two residuals (Phy-RK-2, dark yellow line) predictions, based on the observed misfit between the observations and the two possible predictive models at the previous step (Phy-RK-0, or Phy-RK-1). These residuals are represented as colored arrows (green for Phy-RK-0, blue for Phy-RK-1).
It is then clear that, at each interval, an additional residual is observed. Indeed, at the end of the i-th interval, one observes residuals up to order i, although up to (i À 1) residuals could be used for the prediction in the same interval, because the residuals prediction at interval a i was based only on the residuals available at the end of a iÀ1 . Note also that the residuals used for prediction at the i-th interval are those available at the latest interval (i À 1), and not those at previous intervals. Indeed, fX s 1 ; . . .; X s n g are here interpreted as the residuals of the predictive models which can be used to perform prediction, rather than the residuals of the actual previous predictions. This is consistent with the dynamic modeling of the system, according to which previous observations of the residuals become soon obsolete as the schedule proceeds, and thus, poorly informative due to the phenomenon dynamics.
Iterating the same argument in the following intervals by adding higher-order residual terms as soon as they become available, the predictor at the i-th step can be defined aŝ where K i i À 1 is the number of residual terms modeled in the i-th activation interval. Note that, if no residual terms (hereafter denoted by K i ¼ 0) are taken into account, the prediction coincides with that generated by the sole physical model. Each residual is predicted aŝ j;a i X ðkÞ s j ;a iÀ1 ; where the observed residuals are given by The optimal weightsk ðkÞ j;a i are found by solving the Kriging system (7), in which the covariogram C a i is estimated from the residuals observed in the previous interval. Thus, it can differ among different time intervals and different residual terms.
The parameter K i is chosen by selecting the number of residuals which produces the lowest average error in the previous interval. This procedure allows us to avoid overfitting, as, in a sequential framework, the choice of K i in a i is based on a test set disjoint from the available observations (i.e., the observations in the interval a iÀ1 ). Thus, K i might vary among the intervals, as we further discuss in Sects. 4 and 5.

Prediction of production rates in a mature reservoir
In this work, we apply the Physics-based Residual Kriging methodology illustrated in Sect. 2 for the prediction of liquid production rates of wells operating in a mature reservoir. In particular, having observed the production rates, which are time-dependent functional data, of some operating wells during a certain time interval, our goal is to forecast the production rates of existing and newly drilled wells in the subsequent time interval.
Referring to the mathematical formulation presented in Sect. 2.3, we thus indicate with Y s;a i the production rate over time of a well placed at location s 2 D operating in interval a i , and with F s;a i ðhÞ the production rate predicted by a physical model suitable for the specific application, as those described in Sects. 4.2 and 5.2. Besides the physical parameters, the vector h includes the wells drilling schedule. We assume that wells are open or closed in N a time instants, ðt 0 ; . . .; t N a À1 Þ, and, within any time interval a i ¼ ½t iÀ1 ; t i Þ, for i ¼ 1; . . .; N a , the well configuration does not change. W i is the set of locations of the active producers in the interval a i . In the context of a mature field, historical data are abundant, making possible to exploit spatial dependence among production rates in the prediction step. We present two applications in Sects. 4 and 5. In both cases, we aim to predict the production rates of a single-phase reservoir. However, in the first case the production is driven by injection, whereas in the second case the production is driven by rock compaction.

Governing equations of single-phase flow in a reservoir
We present the general equation for single-phase in a porous medium, which provides the basis for the physical models used in the two test cases described in Sects. 4 and 5. A petroleum reservoir is a sub-surface region containing hydrocarbons trapped in porous rocks. Although a reservoir usually contains water, oil and, depending on the temperature and pressure conditions, gas, in this work we restrict our analysis to single-phase isothermal flow conditions where production occurs because fluid is injected in the reservoir or because a combination of fluid expansion and rock compaction provides the necessary energy, see Dake (1978) for a review of these mechanisms. Neglecting poromechanics coupling, a single phase fluids flow in the reservoir is modeled according to the mass conservation PDE Àr Á ðquÞ Àm ¼ o ot ð/qÞ; where q ¼ qðpÞ is the mass density of the fluid, / ¼ /ðpÞ is the porosity, p is the pressure, u is the Darcy's velocity which can be related to fluid pressure, according to Bear (2013), by the Darcy's equation where l ¼ lðpÞ is the fluid viscosity and k is the permeability tensor, which is a local rock property that measures the ability of the fluid to flow through it. Combining the two equations we get the pressure equation Solving Equation (13) for pressure requires the definition of a thermodynamic model, q ¼ qðpÞ, and a compaction model, / ¼ /ðpÞ. These closure models depend on rock and fluid characteristics and are, thus, case-specific. Then, the time evolution of the fluid can be simulated provided that initial and boundary conditions are defined. In this work, we assume, as initial conditions, the equilibrium state of the reservoir before production and, as boundary conditions, that our reservoir is completely sealed from the rest of the subsurface. Furthermore, the mass source ratem can be written as the sum of the contributions of the N w wells, i.e.m ¼ P N w j¼1m j ðtÞdðs À s w j Þ, where dðsÞ is the Dirac's function, and s w j 2 D,m j ðtÞ are the positions and the mass rate produced/injected at time t at the j-th well. We assume that, in the development of the field, new wells can be drilled or existing wells can be closed in the reservoir.
The solution of equation (13) can be numerically computed under specific initial and boundary conditions and proper closure relation using a general purpose numerical simulator, such as OPM (Flemisch et al. 2011) or ECLIPSE (Schlumberger 2019), which essentially follows the techniques described in Aziz and Settari (1979); Lie (2019); Peaceman (1977): finite volume discretization of mass conservation equations on hexaedrons, fully implicit time-marching scheme for average fluid properties (cell pressure values) and well models based on the Peaceman's formulation (Peaceman et al. 1983) which links inflow/ outflow rates, cell pressure and bottom-hole fluid pressure p wf .

Single phase flow driven by injection
In this first test case, we consider a single-phase flow in a reservoir, in which the production is driven by fluid injection. Albeit being not realistic in the context of hydrocarbon recovery, as the liquid production is usually not driven by the injection of the same fluid, this situation can appear in the context of groundwater aquifer management (Jakeman et al. 2016) and it allows us to show the effectiveness of our approach in a relatively simple test case.

Dataset description
In order to ensure full disclosability, we analyze the production data produced by a synthetic reservoir simulated using OPM. The reservoir simulator OPM is employed only for the data generation and it does not play any role in the predictive model. Hereafter, we describe the characteristics of the synthetic reservoir in terms of geometry, fluid properties and well schedule. We consider a reservoir of dimensions 25600ft, 25600ft and 50ft along x-axis, yaxis and z-axis respectively, discretized with 128 Â 128 Â 1 cells, whose dimensions are 200ft, 200ft and 50ft, respectively. We generate the reservoir porosity / 0 as the logistic transformation of a realization of a random field with isotropic spherical variogram with range R ¼ 4000 ft, sill r 2 ¼ 0:1 and zero nugget, which was linearly scaled in the range [0.1, 0.3]. We build the permeability tensor k as diagonal with components k x ¼ k y ¼ k, where k, expressed in milliDarcy, is obtained as log k ¼ log 40 þ 5ð/ 0 À 0:1Þ.
The resulting porosity and permeability fields are reported in the supplementary material in Figure S1. In this case the compaction model is simply / ¼ / 0 ½1 þ c rock ðp À p rock Þ while the water density is q ¼ q 0 ½1 þ c w ðp À p w Þ. More specifically, c rock ¼ 9:8141 Á 10 À7 psi À1 , c w ¼ 2:74 Á 10 À6 psi À1 , p rock ¼ 5801:51 psi and p w ¼ 3118:3 psi. Fluid viscosity l is set to l ¼ 0:39851 cP. A total number of 256 injectors and 225 producers are drilled in the reservoir in N a ¼ 8 activation time intervals of equal length. During each activation instant t i ¼ 0; . . .; 7, a group of injectors and producers is drilled, adding them to the already operating wells. The wells are placed according to a five-spot pattern, and their activation schedule is shown in Fig. 2. We impose a bottom hole pressure of 3000 psi for producers and of 5000 psi for injectors. The reservoir was implemented in OPM, obtaining the liquid production rates shown in Fig. 2.

Physical model
We now describe the physical model which shall be employed for this test case, analogously to Aziz and Settari (1979);Peaceman (1977). Even though fluid and rock in the reference model simulated using OPM are compressible, in the logic of our method we implement an approximate solution where fluid and rock are incompressible, so that q is constant and / does not play any role. Thus, equation (13) reduces to which can be seen as a conservation equation for the fluid volume at reservoir conditions. The boundary condition ensures zero net flow through the reservoir boundaries. Equation (14) is solved by means of a finite volume discretization technique (Aziz and Settari 1979). We employ a two-dimensional uniform grid, which consists of rectangular blocks of dimensions ðDx; DyÞ and we indicate with H the reservoir thickness. We use the same spatial discretization as the reservoir simulator described in Sect. 4.1, thus ðDx; DyÞ ¼ ð200ft; 200ftÞ and H ¼ 50ft. The numerical method results in solving a linear system of the form where A is a pentadiagonal matrix, p contains the unknown pressure numerical approximation in each block and b is the right-hand side which is zero everywhere except for the elements corresponding to blocks containing a well. Details are given in the supplementary material, Section S1. The linear system (15) gives, in each activation interval, an approximation p of the reservoir pressure p. Then, the prediction F s 0 ;a i ðhÞ is given by (14), where, with reference to equation (2), uðhÞ is approximated by the vector p and F s is the Peaceman's well model (Peaceman et al. 1983) applied to the approximated pressure p in the well location.
The model parameters h are the permeability tensor k, the fluid viscosity l, the active wells locations s w j , the wells connection factors T w and the bottom hole pressures p bhp w . In this test case, physical model predictions F s 0 ;a i are constant in the whole activation interval, being equation (14) time-independent.

Results and discussion
We now apply Phy-RK for the prediction of the liquid production rates simulated as described in Sect. 4.1, employing the physical model described in Sect. 4.2. For each activation phase, we compute the physical model predictions solving equation (14). The residuals are then predicted based only on the residuals of the previous activation interval according to model (11). We point out that, for each interval, we ignore the initial part of the data, i.e. the part which connects zero to the first observed value in the interval, as it is due only to the linear interpolation we perform on the simulated data to obtain a uniform time sampling.
We measure the relative error E between the observed data Y s 0 ;a i and the predictionŶ s 0 ;a i in the activation interval a i as The relative error E i is dimensionless and expresses the cumulative absolute difference between the observed and the predicted rates, normalized by the total cumulative observed production. We compare four approaches derived from the general predictor (11). The first one is the pure Ordinary Kriging (OK), in which functional Kriging, with only the intercept in the drift linear model (4), is employed, setting to zero the physical model predictions. The second one is the pure physical model (Phy), in which the predictions are F s 0 ;a i , setting to zero the predicted residuals. Moreover, two versions of the Physics-based Residual Kriging are tested, in which the method described in Sect. 2.2 is applied with two different drift terms respectively: for the Physics-based Residual Ordinary Kriging (Phy-ROK) we use only the intercept as regressor, thus m s ¼ b 0 , whereas for the Physics-based Residual Universal Kriging (Phy-RUK) the drift m s ¼ b 0 þ b 1 p s , p s being the numerical solution of the physical model (14) in the current interval a i evaluated at location s. Note that the approaches can be compared starting from the second activation interval. As we observe the production rates in following intervals, we could use Phy-RK with increasing number of residual terms. Selection of the number K i of residuals involved in predictor (11) for interval a i is done by looking at the best predictor for the previous interval a iÀ1 . This leads to K i ¼ 1 in all the intervals, for both Phy-ROK and Phy-RUK (see Fig. 3).
Rates predicted by Phy-RUK are shown in Figure S3 in the supplementary material, exhibiting a great agreement with the observations. Figure 4a displays the boxplots of the relative errors for each activation phase, comparing the different approaches. We note that, in this example, Phy has a lower prediction error than OK. However, combining the two approaches in the Phy-RK leads to remarkably lower errors. In particular, we observe that Phy-RUK, which uses the pressure as regressor, leads to consistently lower errors, showing that it can effectively exploit some information from the physical model in the geostatistical   Fig. 2 Simulated production rates and schedule for the reservoir described in Sect. 4.1.
Top panel: blue lines represent the production rates corresponding to newly drilled wells in each activation interval, gray lines correspond to wells already operating. Bottom panel: blue circles represent the newly drilled producers, red crosses the newly drilled injectors and grey symbols the already operating wells part of the model. Furthermore, we point out that the errors tend to be lower in the last activation intervals, when functional Kriging can take advantage from a higher number of observations. Note that, thanks to the sequential nature of our problem, we do not need cross-validation to select, among the four options, the model to employ in each interval. Indeed, we can choose the one that gives the lowest average relative error in the previous interval, assuming that there are no strong variations in the process that generates the data (both the physics and the stochastic process of the residuals) between consecutive time intervals. According to this criterion, Phy-RUK results to be the best model in all the intervals.

Robustness to the compressibility coefficient
We now want to investigate the robustness of our approach to changes in the fluid compressibility c w , generating a new set of production data with OPM-as described in Sect. 4.1-with a higher fluid compressibility. This analysis can be used to evaluate the ability of the method to correct a more and more approximated physics in the simplified modeling. We here discuss three scenarios, in which the data are simulated multiplying the fluid compressibility by a factor of 10, 100 and 1000, with a baseline compressibility of 2:74 Á 10 À6 psi À1 . Figure 4 shows the relative errors boxplots varying the water compressibility (the other cases are reported in the Relative Error [-] (b) c w = 2.74 · 10 −3 psi −1 Fig. 4 Relative errors boxplots of the predictions for the case described in Sect. 4 produced by OK, Phy, Phy-ROK and Phy-RUK varying the fluid compressibility c w . Note that the y-axis is represented on a log scale supplementary material, Figure S4). We observe that the physical models predictions F s 0 ;a i are unreliable when the water compressibility is misspecified. However when the predictions F s 0 ;a i are employed in the framework of the Phy-ROK and Phy-RUK, they still significantly contribute to decrease the errors. Furthermore, we note that the errors increase when the compressibility increases. In correspondence of c w ¼ 2:74 Á 10 À3 psi À1 , the OK approach produces errors which are comparable to Physics-based Residual Kriging. This suggests that, when the physical model becomes too inaccurate, there is no value-added in considering the term F s 0 ;a i in model (9), but the residual modelization can partially correct the predictions. Concerning this aspect, using a spatial regression model with differential penalization (SR-PDE, Arnone et al. 2019), with an inaccurate physics would lead to worse predictions than a regression model without physics penalization.
Indeed, in such a framework the functional data would be modeled as where f 0 is a spatial field and e a zero-mean residual, independent of the other residuals. The field f 0 is then estimated minimizing the following functional where k is a smoothing parameter, and of =ot þ Lf ¼ u is the time-dependent PDE that expresses the prior knowledge on the phenomenon under study. When the physics is misspecified in (16)-(17), the field estimated via SR-PDE is biased towards it, leading to worse predictions than a pure regression model. Instead, we showed that Phy-RK may benefit also from such an imprecise physical model, thanks to the geostatistical modeling of the residual field X s .

Permeability field degradation
Another important aspect to investigate is the partial knowledge of the parameters that play a role in the reservoir dynamics. We now assume that the permeability field is only partially known and we analyze two different scenarios. In the first one, we assume that the permeability field is corrupted by a spatially dependent noise, whereas in the second one, the permeability field is known only at the active well locations and the whole permeability field is obtained using scalar Kriging. In both the cases, Phy-ROK and Phy-RUK prove to be the best predictors, as discussed in details in the supplementary Section S2.

Single-phase flow driven by rock compaction
In the second test case, we deal with a single-phase flow in a reservoir which can be developed under primary depletion because production is supported firstly by rock compaction and then by fluid expansion, see Dake (1978). This problem idealizes typical production behavior in many real fields, where, for very long periods of time, production is solely due to the capability of the rock to compact and then provide energy to the fluid, see for instance the Ekofisk field (Sulak et al. 1991), the Valhall Field (Sulak et al. 1991) and a large part of the turbidite reservoirs in the Gulf of Mexico, (Morgenthaler et al. 2012).

Dataset description
Also for this test case, we consider data generated from the simulation of a synthetic reservoir, hereafter described. We rely on the ECLIPSE simulator, which fully implements rock compaction models, on the contrary to OPM simulator. As before, the ECLIPSE simulator is used only to produce the observed rates and not to generate the predictions. We consider the same reservoir illustrated in Sect. 4.1 in terms of geometry and petrophysical properties. The rock is considered to be compressible, with a porosity that depends on the pressure as /ðpÞ ¼ / 0 gðpÞ, where / 0 is the porosity at the reference pressure 7386 psi and gðpÞ is a pressure-dependent porosity multiplier, reported in Table 1. The fluid compressibility is set to 3:13 Á 10 À4 psi À1 and an initial reservoir pressure of 7000 psi is prescribed. During N a ¼ 8 activation intervals, 100 producers are incrementally drilled in the reservoir, according to Fig. 5, and they operate at a constant bottom hole pressure of 3000 psi. We implemented the reservoir setting in ECLIPSE, getting the observations of the production rates depicted in Fig. 5.

Physical model
We use a physical model which takes into account the rock compression, but in a simplified manner, whereas the fluid compression is not modeled. Starting from equation (13), we assume a constant fluid density q and a linear relation between porosity and pressure, with c rock constant in the considered time interval, / ¼ / 0 ½1 þ c rock ðp À p rock Þ, where p rock is the reference pressure, obtaining the following conservation equation Then, discretizing equation (18), on a Cartesian grid, analogously to Sect. 4.2, and adding the implicit discretization of the temporal derivative of time step Dt, we end up with the following linear system, where the unknown is the discretized pressure at time t þ Dt ðA þ DÞp tþDt ¼ Dp t þ b: A and the vector b are defined according to (15), whereas D is a diagonal matrix, whose elements are D ii ¼ V i;j / 0 c rock =Dt. Thus, we need to solve a linear system at each time step, starting from the initial pressure p a i 0 . The spatial discretization ðDx; DyÞ is set to (200ft, 200ft), with a reservoir thickness H ¼ 50ft. The time step Dt is set to 28.95 days (which means we need 25 steps to cover each activation interval a i ) and we prescribe an initial condition p a 1 0 ¼ 7000 psi for the first time interval. In order to obtain a physical model which produces meaningful predictions, we consider a constant rock compressibility in each time interval a i equal to the sum of the fluid compressibility and the derivative of the function gðpÞ evaluated at the initial pressure p a i 0 of each interval.

Results and discussion
For this problem, we apply Physics-based Residual Kriging analogously as in the first application, described in Sect. 4.3. We compare the same four approaches, namely, OK, Phy, Phy-ROK and Phy-RUK. In the case of Phy-RUK, the pressure computed solving equation (18), averaged over the i-th time interval, is used as covariate for the drift term. As in the first test case, we select the optimal number of residuals K i in each interval as the one that produced the lowest average relative error in the previous interval, according to Figure S5 in the supplementary material. A major difference is in the resulting choice of the number of residual terms K i . Indeed, in this case, it is often convenient to select more than one residual term (see Table 2). Note that selected K i is often coherent between the Phy-ROK and Phy-RUK approaches, except for slight discrepancies in the fourth and the sixth intervals. In the following, we assume that, in each time interval, the number of employed residual terms is chosen according to Table 2. In many intervals, the Phy-RK method leads to much lower errors than a pure data-driven or pure-physical approach (see Fig. 6a and Table 2). In fact, the Phy-RUK proves to be the best predictor in the first intervals, whereas in the last ones the best is Phy-ROK. This is due to the degradation of the physical model predictions starting from the fourth interval, that implies also a worse reservoir pressure approximation, impacting the Phy-RUK performances. Note also that the chosen method for prediction (marked as ''Selected model'' in Fig. 6a) is always optimal, except for the fourth interval, where the best model would be Phy-ROK but Phy-RUK is selected instead. In correspondence of this interval, we observe a change in the goodness of the physical model predictions, that consequently impacts the Phy-RK predictions, leading to similar errors between Phy, Phy-ROK and Phy-RUK in intervals 4 and 5. However, starting from the sixth interval, the gain of the Phy-ROK method over Phy becomes more and more substantial.
To conclude, the Phy-RK method leads to a large improvement over the OK method in all the intervals. The Phy predictor is comparable to Phy-RK only in intervals 4 and 5, with much higher errors in the others. In Fig. 6b we show the predictions computed by the selected best models in each interval for this test case, noting a great agreement between predictions and observations.

Conclusions
In this work, we presented the Physics-based Residual Kriging method, a novel approach for geostatistical problems in presence of prior information expressed by a physical model. As further extension, we formulated it for sequential problems, where the Physics-based Residual Kriging predictor can be incrementally enriched by adding residual terms at each activation time interval. We applied this approach for the prediction of production rates in a mature reservoir. With two examples, we showed that this approach leads to remarkably lower prediction errors when compared to a pure geostatistical approach or to a pure physical model. Furthermore, we tested the robustness of our approach to changes or partial knowledge of the physical model parameters. Our study shows that a Physics-based Residual Kriging approach is indeed beneficial also when an inaccurate physical model is employed in its formulation. For all these reasons, Phy-RK represents a valuable approach to geostatistical problems, whenever a (possibly approximated) physical model is available.
As future research, we plan to apply the Phy-RK methodology with sequential update to more realistic reservoirs, where two or three phases flows make the underlying physics more complex. In these contexts, one might avoid the burden of a 3D full-physics simulation, by  Fig. 6 a Relative errors boxplots of the predictions produced by OK, Phy, Phy-ROK and Phy-RUK. b Observed rates and rates predicted by the selected models for the case described in Sect. 5 employing a surrogate model, e.g. INSIM (Zhao et al. 2015) or FlowNet (Kiaerr et al. 2020), as the physics model term within Phy-RK approach. Availability of data and material Due to confidentiality concerns, the data are not made publicly available.
Code availability Due to confidentiality concerns, the code is not made publicly available.

Declarations
Conflict of interest The authors have no conflicts of interest to declare that are relevant to the content of this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.