The use of flow diagnostics to rank model ensembles

Ensembles of geomodels provide an opportunity to investigate a range of parameters and possible operational outcomes for a reservoir. Full-featured dynamic modelling of all ensemble members is often computationally unfeasible, however some form of modelling, allowing us to discriminate between ensemble members based on their flow characteristics, is required. Flow diagnostics (based on a single-phase, steady-state simulation) can provide tools for analysing flow patterns in reservoir models but can be calculated in a much shorter time than a full-physics simulation. Heterogeneity measures derived from flow diagnostics can be used as proxies for oil recovery. More advanced flow diagnostic techniques can also be used to estimate recovery. With these tools we can rank ensemble members and choose a subset of models, representing a range of possible outcomes, which can then be simulated further. We demonstrate two types of flow diagnostics. The first are based on volume-averaged travel times, calculated on a cell by cell basis from a given flow field. The second use residence time distributions, which take longer to calculate but are more accurate and allow for direct estimation of recovery volumes. Additionally we have developed new metrics which work better for situations where we have a non-uniform initial saturation, e.g., a reservoir with an oil cap. Three different ensembles are analysed: Egg, Norne, and Brugge. Very good correlation, in terms of model ranking and recovery estimates, is found between flow diagnostics and full simulations for all three ensembles using both the cell-averaged and residence time based diagnostics.


Introduction
Ensemble modelling is becoming an increasingly popular part of petroleum reservoir simulation as a way of capturing uncertainty in reservoir parameters and assisting with decision making and forecasting. For a particular reservoir, an ensemble of equiprobable models can be created for each geological scenario, spanning the range of uncertainty in various parameters, such as porosity, permeability, etc. Use of multiple geological scenarios will often be essential to capture the uncertainty inherent to the reservoir [2] as, e.g., exemplified in the Watt Field study [1].
Combining simulation results from the whole ensemble should give a more accurate picture of the range of results which could be encountered during future operation of the field. However, full multiphase simulations of Francesca Watson francesca.watson@sintef.no 1 Mathematics and Cybernetics, SINTEF Digital, P.O. Box 124 Blindern, N-0314 Oslo, Norway highly-detailed 3D reservoir models can be computationally expensive to run and this tends to put limits on the parameter ranges that can be investigated. Thus, simplified models are required as a way of ranking models and approximating the range of uncertainty, before further in-depth analysis can be carried out on a few selected models.
Flow diagnostics provide a way to quantify and visually inspect flow fields in a fraction of the time it would take to carry out a full physics simulation. The prerequisite is a set of representative fluxes. From these we can obtain a set of visually intuitive quantities that give the travel time for mass-less particles that passively follow the flow field from an injector into the reservoir and from a point in the reservoir to the nearest producer; delineate regions drained by specific producers or swept (flooded) by specific injectors; determine whether pairs of injectors and producers communicate or not and measure the relative strength of their connection; determine how flux is allocated between different injectors and producers; establish the volumetric region influenced by specific well-pairs; give the distribution of residence-times for all flow paths connecting injectors and producers; measure the dynamic heterogeneity within drainage, sweep, or well-pair regions; or compute simple estimates of economic measures such as net present value. Most of these quantities have traditionally been associated with streamline simulation [4,27,[32][33][34], but it has also been shown that similar quantities can be computed equally well using standard finite-volume methods [20,25,26].
Flow diagnostics are quick to compute and can thus be used to interactively explore fluid communication in a geological model before or after running more comprehensive multiphase flow simulations. Quantities that measure the heterogeneity in flow paths naturally suggest themselves as proxies characterising flow in the reservoir, as this heterogeneity will have a large influence on how a reservoir might be swept by invading fluids (e.g., injected water) and how the in-situ fluids might be displaced and produced. Representative flow fields necessary to compute flow diagnostics can in many cases be computed from a single-phase, incompressible flow equation, or from a single multiphase pressure solve if the initial fluid distribution and necessary fluid data are available. This way, both the flow simulation, and the subsequent flow diagnostics calculations are very quick to run. Although using a simplified model means results are not quantitatively correct, the relative heterogeneity between models can still be estimated. Various types of flow diagnostics computed with finite-volume methods have been utilized to develop proxies that differentiate between macroscopic and microscopic sweep improvements resulting from polymer injection [13], to optimize waterflood performance [5,19,20], to validate rapid prototyping of reservoir models [9], in production data integration [23], to rank downscaled models in chemical EOR [35] or validate upscaling methods [16,22], and to cluster [36] or initialize [3] data-driven models, to mention a few applications.
Flow diagnostics can be especially useful when faced with an ensemble of models of a reservoir that attempt to span the possible range of uncertainty; see e.g., [12,[29][30][31]37]. For a large ensemble it would be unfeasible to run a full simulation for every realisation, and even more so in the case of multiple geological scenarios that each have an associated ensemble of realizations. Using flow diagnostics, on the other hand, it is possible to gain insight into the flow behaviour of many different models in a fraction of the time it would take to simulate them all individually. Flow diagnostics is not unique to this end; examples of other approaches to rank or prescreen ensemble models at a low computational cost that are somehow similar in spirit include the use of linearized flow models to identify worst cases [7] and the use of fast marching combined with the concepts of depth of investigation and diffusive time-of-flight for unconventional reservoirs [38,39].
In this paper, we review the basic concepts of flow diagnostics and demonstrate how they can be used to rank and quickly choose realizations that are likely to span the range of uncertainty present and are therefore good candidates for further investigation. We also present new formulations that improve proxies of multiphase performance measures and extend them to cases with non-uniform initial saturation. To assess to what extent these various flow-diagnostics techniques can be used to robustly rank ensemble models, we apply them to three different cases: the conceptual Egg model [10], the Norne field from the Norwegian North Sea (ensemble generated with the code from [18], but without some of the fault parameters), and the model ensemble from the Brugge benchmark [24]. All methods discussed have been implemented in the diagnostics module of the opensource MATLAB Reservoir Simulation Toolbox (MRST). Features are freely available online in MRST 2020a (and subsequent versions) [28] along with the graphical user interfaces used to visualise and inspect flow diagnostics results for ensemble members [17].

Basic flow diagnostics
To make the paper more self contained, we first review basic quantities used in flow diagnostics before discussing extensions. A more detailed introduction can be found in the recent textbook by Lie [15,Chapter 13] or in one of the original articles that introduce these methods in a finitevolume setting [20,26].

Time-of-flight
Flow diagnostics rely on two basic quantities: time-offlight and influence regions. To define these, we assume a reservoir with porosity φ and a superficial Darcy velocity v defined as volumetric discharge per area. For ease of exposition, we assume for the moment that v is steady and divergence free, i.e., ∇ · v(x) = 0 for all x ∈ Ω. Through each point x there will be a unique curve that starts at the nearest injector (or fluid source), terminates at the nearest producer (or fluid sink), and is tangential to the velocity field at every point. We can parameterize this streamline Ψ by its arc length r. In streamline simulation [4], however, it is more common to use time-of-flight (TOF) as spatial coordinate, which measures the travel time of a theoretical particle by the interstitial fluid velocity v/φ. We can define the TOF coordinate τ through two mathematically equivalent equations: In flow diagnostics, we usually refer to two different TOF values: forward TOF, τ f , defined as the travel time from the nearest injector to a given point in the reservoir, and backward TOF, τ b , defined as the travel time from the given point to the nearest producer. The total travel time along a streamline from inflow to outflow is called residence time and equals the sum of τ f and τ b . Forward and backward TOF are useful to identify regions that are likely to remain unswept or undrained, whereas the residence time distinguishes high-flow and stagnant regions.
As an alternative to tracing streamlines and evaluating the integral in Eq. 1 to compute point values of τ , we can solve the second equation in Eq. 1 using a standard finite-volume method, as first proposed by [21] and [6]. This gives us volume-averaged values of τ , which means that the discrete τ -values represent volumetric averages of τ over all streamline segments crossing individual cells. Cell-averaged τ -values may be more difficult to interpret conceptually, but have the advantage that they are directly linked to how the propagation of a piston-like displacement front would be resolved by a standard multiphase simulator.

Influence regions and volumetric partitions
Influence regions are defined as the locus of the points that lie on all streamlines emanating from a point set whose influence we want to trace, typically a well or a segment of a well. Since each point can only lie on a single streamline, these influence regions are distinct volumetric objects that together form a partition of unity. Mathematically, we can define an influence region as the solution to the following steady advection equation This equation will only make sense if the inflow condition is set only on parts of the inflow boundary (e.g., specific wells or a group of wells), since otherwise C would be identical to one inside a connected domain. The quantity C is sometimes referred to as a steady tracer concentration, but in the current context it is a purely theoretical quantity used to delineate reservoir volumes and should not be confused with the concentrations of physical tracers. When Eq. 2 is solved with a finite-volume method, the computed values will lie in the interval [0, 1] and represent the portion of the total fluid volume passing through each grid cell that can be attributed to the point set defining the inflow condition. Influence regions computed with a standard finite-volume method will contain significant numerical diffusion and will thus not be sharply defined. However, they will give the regions influenced by passively advected quantities in a standard multiphase simulation.
Influence regions naturally lead to volumetric partitions of the reservoir volume. Sweep regions are simply defined as the influence regions of individual injectors, whereas drainage regions are defined by solving Eq. 2 with a reversed flow field and inflow conditions set at producers. Intersecting influence region from injectors and producers gives us well-pair regions, well-allocation factors, the relative communication strength of individual well pairs, etc. More details are found in [15,Chapter 13] and [20].
Because each cell can be part of several influence regions, it is often advantageous to compute time-of-flight associated with each influence region C by solving a discrete version of to reduce undesirable averaging effects. Here, Γ i denotes the part of the inflow boundary that defines the influence region, which is assumed to satisfy Eq. 2 with C| Γ i = 1.
That is, we solve for τ C and back out τ = (τ C)/C in all cells with positive C-values.

Residence-time distributions
The heterogeneity of a displacement process is linked with the variability in travel times through the reservoir, e.g., the distribution of τ at the outlet boundary, which we refer to as residence-time-distribution (RTD). For the outlet boundary Γ o , the normalized RTD can be defined as where δ is the Dirac delta function, τ f is the solution of the TOF-Eq. 1 on Γ o parameterized by the surface coordinate σ , and n is the outward pointing unit normal. It follows that the normalization factor F o equals the total flow rate (allocation) from Γ i to Γ o . This formulation is not particularly useful for discrete approximations (as we discuss subsequently). We can instead compute the RTD by tracing a unit pulse from Γ i to Γ o , i.e., by solving the linear transport equation and letting The normalization F o is defined so that ∞ 0 T o dt = 1 and equals the total volumetric flow. The equivalence of Eqs. 4 and 6 follows from the fact that c(x, t) = δ(t − τ f (x)) is indeed the solution of Eq. 6, see Eq. 24 in the Appendix. Furthermore, influence region Eq. 2 and time-of-flight Eq. 1 are zeroth and first order moment equations of Eq. 5 as shown in the Appendix, so

Measures of dynamic heterogeneity: F-Φ diagram and Lorenz coefficient
Classical sweep theory includes a number of static measures for characterizing heterogeneity, such as flow and storage capacity, the Lorenz coefficient, the Koval factor, and Dykstra-Parson's permeability variation coefficient; see e.g., [14] for a more comprehensive overview. Flow diagnostics reinterprets these measures in a dynamic setting so that they characterize the heterogeneity in flow paths rather than the heterogeneity in static petrophysical properties. Large dynamic heterogeneity means large variations in the length of flow paths between injectors and producers, which in a water-or gas-flooding scenario typically manifests itself in early breakthrough of injected fluids.
To define such measures, we start by defining the flow capacity F (t) and the storage capacity Φ as The normalization term Φ o is the total pore volume, since Flow and storage capacities can also be computed directly from cell-averaged τ -values, as explained later on in the text and more thoroughly in Section 13.2 of [15].
is often used to illustrate dynamic heterogeneity. If we think of the reservoir as a bundle of streamtubes, with a piston-displacement process taking place inside each streamtube, Φ(t) denotes the volume fraction of the streamtubes that have broken through by time t, whereas F (t) is the corresponding fractional flow of the displacing fluid. If all flow paths have the same residence time, F (Φ) = Φ is a straight line, but in the general case, F (Φ) ≥ Φ is a concave curve. The more concave the curve is, the larger is the spread in residence times, from fast flow paths that break through early (small values of Φ) to slow flow paths that may take for ever to break through (Φ close to unity). This spread is usually characterized by the Lorenz coefficient L c , which measures twice the area between the concave F (Φ) curve and the straight line F = Φ, i.e., This means that L c = 0 for a homogeneous displacement and L c = 1 for an infinitely heterogeneous displacement. Previous experience has shown that L c correlates very well with forecasts of hydrocarbon recovery predicted by multiphase flow simulations and hence can be used as an effective flow proxy in various reservoir management workflows [20,26,27]. Flow and storage capacities can also be calculated directly from the cell-averaged τ -values computed by a finite-volume discretization of the TOF Eq. 1, as first suggested by Shahvali et al. [26]. Assuming that the cells have been sorted in ascending order by their average residence timesτ j , so thatτ 1 ≤τ 2 ≤ · · · ≤τ N , where V j is the pore volume of cell j and q j = V j /τ j is the flow rate into the cell. As discussed in [13,15], flow and storage capacity values obtained from Eq. 11 can differ substantially from those obtained by simulating the pulse Eq. 5. The reason is thatτ equals the mean of the RTD distribution inside each cell, which can be very different from the first mode. However, Lorenz coefficients and other heterogeneity measures obtained from Eq. 11 are still suited for ranking as they correlate well with full simulation output.

Sweep efficiency and fractional recovery
An alternative flow proxy is the volumetric sweep efficiency E v (t), which measures how efficiently injected fluids are used. It is defined as the volume fraction of in-place fluid that has been displaced by injected fluid by time t, or equivalently, by the ratio between the volume contacted by the displacing fluid at time t and the volume contacted at time t = ∞. The normalized sweep efficiency can be expressed in terms of τ f (12) or computed directly from F and Φ by the following formula (see Eqs. 27-28 in the Appendix) In a completely homogeneous piston displacement, the inplace volume will be displaced by time t D , and thus t D represents units of pore volume injected. Before first breakthrough (t < min τ f ), E v equals the injected pore volume t D . After breakthrough, Φ is the volume fraction of flow paths that have been fully swept, whereas (1 − F )t D is the volume fraction contributed by the swept parts of the flow paths that have not yet broken through. This means that, E v (t D ) is a concave curve bounded above by y = min(x, 1). This curve highlights fluid responses after first breakthrough and can be used as a proxy for recovery factor. Fractional recovery curves, i.e., (1 − F ) as function of t D , emphasize breakthrough behavior and can have utility as a proxy for early-time fractional recovery (see Figure 13.6 of [15]). The heterogeneity measures described so far correlate well with recovery factors predicted by traditional multiphase simulators as long as the initial fluid distribution is relatively uniform. The effect of spatially varying saturations on time-of-flight is easily incorporated into the flow diagnostics analysis by solving a multiphase pressure equation with mobilities evaluated from the initial fluid distribution to compute a representative flow field for the time-of-flight Eq. 1, which is what we do herein. The challenge arises if the reservoir includes a significant stationary or semi-stationary water zone because flow paths that traverse this zone do not directly affect the displacement of hydrocarbons and should therefore not be included in the calculation of heterogeneity measures. For nonuniform initial saturations, as typically seen in field models, it will usually be desirable to restrict the heterogeneity measures to, e.g., the initial hydrocarbon volumes, to improve the correlation with the true recovery factor. Variants of this approach have been suggested for the Lorenz coefficient [8,20] but have the weakness that they only consider the heterogeneity in the oil zone and not how far, measured in τ b , the oil is from a producer. Sweep efficiency, however, generalizes more naturally to individual phases. The oleic sweep efficiency, for instance, can be expressed as where S o is the oil saturation and V o = Ω φS o dx is the total oil volume. In this definition, we have used the backward time-of-flight (τ b ) as we wish to approximate the oil volumes produced at time t rather than the oil volumes displaced (as would be the case using τ f ). For the total sweep Eq. 12, however, the forward and backward expressions are equivalent.
We can also compute approximations to sweep efficiency from cell-averaged τ-values by using Eq. 12 directly such that where the cell indices are sorted according to ascending τ f values. Similarly, the oleic sweep efficiency Eq. 14 can be approximated by where S o,j is the oil saturation of grid cell j , and cell indices are sorted by ascending τ b -values.

Multiphase performance measures
If residence-time distributions are available, it is possible to go one step further than Eq. 14 in approximating recovery by considering the evolution of multiphase fluid saturations along the time-of-flight variable. This is analogous to what is done in streamline simulation, as we regard an interaction region as an implicitly defined flow unit. The main difference is that here we account for varying arrival times by employing the RTD in the 1D transformed transport equations. An earlier variant of this idea was reported by [13]. Assume a steady-state flow situation in a region with total allocation rate F o . A given 3D saturation field S w,0 (x) can be mapped to the 1D time-of-flight by (see Eq. 25 in the Appendix for an explanation) Hence, S w,0 (τ f ) represents the average saturation over all x in the level set τ (x) = τ f within Ω.

1D solution along forward time-of-flight
When solving for water transport along the forward timeof-flight, the RTD only appears in the evaluation of production rates. Hence the transport equation along τ f simply becomes where f w is the water fractional flow function. The approximate water production rate q w then becomes Since we assume a steady-state situation, the approximation of oil rate in a two-phase scenario is then simply given For an interaction region having large variance in its RTD, the production approximation Eq. 19 may suffer from significant smearing effects compared to a full resolution simulation. One option to ameliorate such effects (but at a higher cost), would be to split the RTD into several bins and solve a transport equation for each bin. As such an approach is refined, it eventually becomes equivalent to streamline simulation.

1D solution along reverse direction of backward time-of-flight
The challenge with solving 1D transport equations along forward time-of-flight is that we average both over flow paths that will reach producers within a relevant timeframe and over flow paths that will not. Alternatively, one can solve for saturation along the reverse direction of the backward time-of-flight (τ b ) instead. This way, we only average the 1D transport solutions over fluid volumes that are likely to reach the well over the production time horizon. Analogously to Eq. 17, we map the 3D saturation field to the reverse time-of-flight by employing the backward tracer pulse response c b to obtain S w,0 (τ b ). For the 1D transport equation, the RTD now becomes the source term, whereas the flux term must be modified to account for the amount of flow having broken through. Accordingly, where F (τ b ) is the flow capacity function Eq. 8. Finally the approximate water rate becomes As shown in the last example in the next section, the version based on backward time-of-flight gives superior results for cases with a nonuniform initial saturation distribution.

Results
In this section, we apply the flow diagnostics just outlined to three different ensemble models. Our purpose is in part to demonstrate the utility of flow diagnostics as a tool to prescreen and rank ensemble models and in part to investigate to what extent the dynamic heterogeneity and multiphase performance measures can be used as less computationally costly, reduced-physics (proxy) alternatives to traditional multiphase flow simulation. All simulations and flow diagnostics were computed using the MATLAB Reservoir Simulation Toolbox (MRST) [15,17].

The Egg model
The Egg model [10] is a synthetic reservoir model that consists of one hundred ensemble realisations of a channelised reservoir. We use the first fifty realisations in our analysis. The model has seven horizontal layers with varying permeability, however, permeability is strongly correlated between the layers, leading to an almost 2.5D permeability field. Porosity is constant therefore all members have the same total pore volume. The waterflooding scenario described in the default setup comprises eight injectors with specified water injection rate and four producers with specified bottom-hole pressure; see Fig. 1. Initially, the reservoir is filled with oil and a connate water saturation of 0.2; the residual oil saturation is 0.25. The oil and water phases have cubic and quartic relative permeabilities curves and 5 cP and 1 cP viscosities, respectively. The strong permeability contrast between the meandering channels and the low-permeability background implies that the overall displacement efficiency to a large degree is determined by how well injectors and producers are connected: if there is a direct connection through a channel, we should expect early water breakthrough and poor areal sweep. It is therefore reasonable to expect that measures of dynamic heterogeneity should be able to rank the ensemble and distinguish high, low, and mid-range performers. One can, for instance, use a cross plot of sweep efficiency versus Lorenz coefficient, as shown to the left in Fig. 2, to identify outliers with high sweep/low Lorenz and low sweep/high Lorenz (numbers 22 and 38) that are likely to represent the best and worst reservoir performance, respectively. Closer inspection of the two extreme ensemble members reveals that number 22 has no direct channel connection between injectors and producers, whereas number 38 has several such connections and thus will see a much earlier water breakthrough (Fig. 1). Differences in flow behaviour between the models can clearly be seen from the RTD distribution in the upper-right plot as well as the unbalanced sweep regions. Figure 2 also highlights the flow diagnostics for the other two realizations shown in Fig. 1. Even though both come out as mid-range performers, measured in terms of dynamic heterogeneity and our simple proxy for net-present value (NPV), the residence-time distribution indicates that their dynamic behavior will be distinctly different. Realization number 19 has the earliest water breakthrough among all the ensemble members because of a high-permeability connection between INJECT04 and PROD03, but a plot of the estimated oil rate (not reported here) indicates that it later may be able to maintain a slightly higher oil production than realization number 8. Altogether, this demonstrates the importance of considering more than one type of flow diagnostics, as they measure/emphasize different aspects of how heterogeneity affects the displacement process.
Having seen how measures of dynamic heterogeneity clearly distinguish different realizations, the next question to ask is to what extent these diagnostics can supplement or replace rank measures from full dynamics multiphase That is, we must investigate correlation, preservation of rank, and how much recovery estimates from flow diagnostics deviate from their true multiphase equivalents.
We start by comparing recovery estimates Eq. 22 for the four selected ensemble members with recovery predicted by full two-phase simulations. Figure 3 also reports the total water cut over all wells for each ensemble member. Overall, the match between full simulations and flow diagnostics is good. Initially, total recovery is the same for both full simulations and flow diagnostics due to uniform saturation around the producers prior to water breakthrough. Water breakthrough occurs first for the flow diagnostics as a result of numerical diffusion in the 1D simulation used to evaluate phase rates from the RTDs. This smears out the advancing front, causing it to reach the producers slightly earlier than in the simulations. At later times, the recovery from flow diagnostics overestimates simulated recovery because the pulse-tracer simulations used to compute the RTDs do not account for coupling between flow and transport. Hence, flow diagnostics is unable to resolve two-phase flow effects like preferential flow paths induced by mobility differences, which reduce the volume of the reservoir being swept.
In an ensemble setting, we expect to have good correlations for the whole ensemble between the simulation metric in question and the flow-diagnostic metric to be used as a proxy. Here, a good match has been found between flow To further investigate the differences between flow diagnostics and simulation results through time we have calculated the Pearson correlation coefficient between results at different times (see Fig. 5). A correlation coefficient greater than 0.9 is considered a good correlation, meaning flow diagnostics results have effectively captured the ranking of models given by simulation results. Up until around 1 PVI, we find good correlations at t D,sim = t D,F D , as expected. Beyond 1 PVI this trend flattens out so simulated rankings correlate best with flow-diagnostics rankings at 1 PVI. This can be explained by the fact that simulated rankings do not change beyond this point even though absolute values are likely to increase over time. (Ranking based on flow diagnostics, on the other hand, does not exhibit the same good auto-correlation after 1 PVI.) As simpler alternative to the recovery estimates is to use a dynamic heterogeneity measure like sweep efficiency or Lorenz coefficient. The left plot in Fig. 6 shows that the cross-correlation between simulated recovery factor and sweep efficiency exhibit a similar overall trend as observed for the recovery estimates in Fig. 5, except for the fact that best correlations systematically lie off the central axis. This has a simple explanation. For a piston-like displacement (no mobility contrasts, linear relative permeability curves), in a single region between an injector and a producer, the 1 The Pearson coefficient ρ measures linear correlation between two data sets, defined as the their covariance divided by the product of their respective variations. Correlations corresponding to ρ = ±1 correspond to data points lying on an exact line. The Spearman coefficient assesses to what extent data show monotonic covariation. flooding front will reach the producer after 1 PVI. For the two-phase fluid properties used in the Egg model, however, the leading displacement front travels 2.72 times faster than a piston displacement front. Therefore, the simulated recovery factor at q PVI will correlate best to the sweep calculated from flow diagnostics at 2.72 q PVI.
When sweep efficiency is computed from cell-averaged τ -values, we only compute the mean residence times for each grid cell and fail to capture the fastest flow paths that give early breakthrough. Hence, the region of best correlation will therefore be shifted more towards the central axis and its placement is somewhat harder to predict than for the RTD diagnostics. Empirical evidence we have gathered by studying different models and scenarios suggests using sweep at 1 PVI as a good rule of thumb for cases with uniform initial saturation. As an example, Fig. 7 demonstrates how correlation improves by using sweep at 1 PVI instead of at 0.5 PVI.
In previous research [20], we observed that the Lorenz coefficient worked well as a simple proxy for recovery when used, for instance, to optimize well placement (see also [11]). For the Egg model, however, correlations for the Lorenz coefficient were generally inferior to sweep efficiency and are not included.

The Norne field
Our next example is slightly more complicated than the Egg ensemble. A benchmark model for the Norne field in the Norwegian North Sea is available from the Open Porous Media project. The reservoir consists of twenty-two different reservoir zones and has considerable stratigraphic and structural complexity. We used code based on [18] to derive an ensemble of 100 models, with varied permeability, porosity, net to gross, and vertical transmissibility multipliers between layers. In the corner-point model of the Norne   field, there is a large variation in permeability in different horizontal layers and this layering structure varies between ensemble members (Fig. 8), leading to more complex 3D flow patterns compared to the Egg model. For simplicity, we assume that the reservoir is fully saturated with oil and set up a hypothetical waterflooding problem driven by a pattern consisting of six injection wells controlled by constant injection rate and five producers controlled by constant bottom-hole pressure. Relative permeability curves for both oil and water are quadratic with zero residual oil and connate water saturations. Viscosities are 5 cP and 1 cP for oil and water respectively. Figure 9 reports various diagnostics for the whole ensemble.
Good correlations can be seen between the simulated recoveries and flow diagnostic measures such as recovery, recovery factor, and sweep efficiency, all estimated from RTDs. Two such correlations are reported in Fig. 10. Sweep efficiencies measured from cell-averaged τ -values also correlate well, and the correlation coefficients are only slightly worse than for the RTD-derived measures.
The Lorenz coefficient also correlates well with simulated recoveries (see Fig. 11), as long as these are sampled at the same dimensionless times and the Lorenz coefficient is evaluated at "time infinity", i.e., using the whole RTD distribution or the whole span of cell-averaged τvalues available. Why these two prerequisites are necessary follows from the definition of the coefficient, which measures the relative spread in the distribution of flow rates q as function of the volume V of the associated flow paths (think of streamtubes). Recall that the residence time is defined as τ = V /q. If two realizations have different pore volume or different total field injection rates, the same instance in time will map to different τ -values. It is therefore important to align the simulated recoveries in dimensionless time (pore volumes injected) before trying to compare them using a relative flow-diagnostics quan-  tity. Likewise, since the Lorenz coefficient measures how much the dynamic heterogeneity departs from homogeneity, i.e., how the residence times of the flow paths deviate from being uniform, the corresponding ranking will not be correct if we truncate the underlying RTD at a too small τ -value.

The Brugge field
The last example increases the complexity further towards full realism by introducing a non-uniform initial saturation. The Brugge ensemble comprises 104 realisations of a synthetic reservoir model [24]. The reservoir has a twophase initial condition with an oil cap at the top of the reservoir and water underneath, separated by a sharp interface. There are 10 rate-controlled injectors surrounding the oil cap and 20 producers controlled by bottom-hole pressure within the oil cap (Fig. 12). Relative permeability curves for both oil and water are quadratic with zero residual oil and connate water saturations. Viscosities are 5 cP and 1 cP for oil and water respectively. With this well configuration most of the flow takes place within a small part of the reservoir, between the injectors and the producers. However, cell-averaged time-of-flight values will take into account flow paths that have really long residence times as they pass through the relatively stagnant water zone. This means that metrics which take the whole volume of the reservoir into account, such as the Lorenz coefficient, do not function well as proxies for heterogeneity, as they include information from a large part of the reservoir that is not involved in the majority of the flow.  Estimating recovery from a 1D displacement solution calculated along forward time-of-flight (using Eqs. 18 and 19) requires the average initial saturation between the injector and the producer. For a non-uniform initial saturation, this can introduce a large error in recovery estimates, as shown in the left plot of Fig. 13, which in turn will lead to poor recovery estimates and model ranking. Recovery estimates calculated using flow rates computed from a displacement problem formulated using backward time-of-flight, i.e., using Eqs. 20 and 21, are much closer and show good ranking of models, as demonstrated in the right plot of Fig. 13. These observations are corroborated by Fig. 14. The figure also attests that discrete estimates of oleic sweep calculated using backward time-of-flight Eq. 16 are a very good proxy for ranking models based on recovery factor. Figure 15 shows that rankings from the calculations of recovery based on backward time-of-flight are good for a large range of times, suggesting ranking between the simulations and flow diagnostics changes very little through time, although obviously total recovery will be different at different times. The formulation based on forward time-offlight, on the other hand, which was so effective for both the Egg and the Norne models, is not usable at all when the saturation distribution along and across different flow paths is far from uniform, as is the case here because of the large water zone.

Conclusions, recommendations, and outlook
The purpose of this work is to investigate how well a flow diagnostics proxy can rank models in an ensemble in the same order as would be given when ranking the models by simulation results. This can be determined by looking at the correlation between a specific flow-diagnostics proxy and a simulation parameter (generally recovery or recovery factor). Care must be taken to ensure that the metrics being compared are of a similar type. For instance simulated recovery factor is a normalised variable, which can be compared with normalised flow-diagnostics measures such as recovery factor, sweep efficiency, or Lorenz Coefficient, whereas simulated recovery should only be compared with estimated recovery.
For the three ensemble case studies used here we have found good correlations between simulation results and appropriate flow-diagnostics proxies at specific times. Both the RTD flow diagnostics and flow diagnostics based on cell-averaged quantities show good capability for achieving similar rankings to simulations, for most cases considered herein Pearson (and Spearman) correlation coefficients have been greater than 0.9.
The most accurate, but also most computationally costly flow diagnostics are recovery and recovery factors estimated from representative 1D displacement simulations along time-of-flight. These should generally be computed using our new backward time-of-flight formulation, as this accounts better for nonuniform initial saturations. For simple scenarios, like the Egg model, with flow paths primarily driven by heterogeneity and pressure drop, and for which multiphase effects coming from the coupling of flow and transport are not particularly prominent, these measures will not only correlate very well with simulations but also provide quite accurate predictions. For cases with more tortuous flow paths and stronger coupling effects, the best one generally can hope for is good correlation, as observed for both the Norne and Brugge ensembles, so that the diagnostics can be used to compare and rank models, perturb underlying parameters in optimization loops, etc. A necessary prerequisite for this is that we can compute representative flow fields for the pulse-tracer simulations, which may be challenging for cases with complex production scenarios with wells coming on and off throughout the time horizon of interest. Note also that correlations between results are generally best when comparing FD results and simulation results at the same time-of-flight / simulation time, indicating that the ranking of ensemble members changes in a similar way through time for both the flow diagnostics and the simulations. This is particularly true early in simulations before multiphase coupling effects have become established.
In most cases, a simpler and viable alternative is to use a heterogeneity measure like sweep efficiency, which can be computed both from tracer-pulse RTDs and from the basic cell-averaged τ -values. The basic diagnostics are much quicker to calculate than RTD diagnostics. Ranking results from basic diagnostics are almost as good as results from RTD diagnostics, so for large ensembles it is possible to use the basic diagnostics to rank models and still be able to capture the range of uncertainty in results and identify ensemble members that are likely to perform better or worse in a simulation. These models can then be carried forward and simulated further.
For sweep efficiency it is important to remember that multiphase displacement fronts travel faster and will at the same instance in time have swept a larger region than a theoretical fluid particle whose passive advection follows the time lines defined by time-of-flight. This will affect and shift the best match-up between simulations and flow diagnostics. In practice, however, we have found that using sweep efficiency at one pore volume injected is a very good recovery proxy. Likewise, when using Lorenz coefficient as a proxy, it is important to not truncate the underlying residence-time distribution. (As a simple rule of thumb, the RTD should span at least 50 PVI.) In the ensembles considered herein, we have assumed that variation in heterogeneity is only reflected in permeability and porosity (or in associated quantities like net to gross or multipliers). With this assumption, it does not matter for the ranking capabilities whether the ensemble members have been generated around a single base case or from multiple geological scenarios. However, if the fluid volumes in place vary largely within the realizations to be ranked, it will not make sense to use absolute flow diagnostic quantities that, e.g., measure oil recovered. We nonetheless believe that it should still be both possible and reasonable to apply relative measures like Lorenz coefficient, sweep efficiency, etc, but this remains to be fully verified.
In our experiments, we have assumed single rock-typing (i.e., a single set of residual saturations and relative permeability curves) is applicable to all ensemble members or that the properties associated with different rock types exhibit so small variation that they can be approximated well by a single (average) set of rock-fluid properties. Time-of-flight, influence regions, and derived heterogeneity measures can be computed for scenarios with multiple rock types that exhibit significant property variation, but will not account in local variations in propagation speeds for multiphase displacement fronts. As a first step in this direction, one can introduce fluid acceleration factors A for each rock type. These should measure how fast a multiphase displacement front moves relative to the flow field and can be determined from the self-similar 1D Buckley-Leverett profile associated with each rock type. For two-phase incompressible flow, the propagation speed equals the maximum derivative of the upper concave envelope of the corresponding fractional flow function. (If similar acceleration factors are needed to mimic more complex flow physics, 1D numerical solutions will generally be necessary.) The acceleration factors can then be incorporated into an alternative travel-time equation on the form, v · ∇τ = φ/A(x). The modified travel timeτ should work as input to the basic flow diagnostic quantities described in Section 2, but more research is necessary to verify this. Moreover, it is not clear how to extend the multiphase performance measures described in Section 3 because with multiple rock types it will be difficult to determine a representative 1D solution that can be mapped onto the travel times.
Similarly, by integrating the product of Eq. 5 with t, we obtain In other words, m 1 satisfies the time-of-flight Eq. 1, which we can interpret as follows To represent the solution of Eq. 5, we first observe that if we disregard domain boundary conditions, then any function of the form c(x, t) = g t − τ f (x) , where g : R → R is a differentiable function, is a solution of Eq. 5. By substitution, we have (24) Since we can regard the delta function as the limit of a sequence of differentiable functions, it follows that the solution of Eq. 5 can be represented as c(x, t) = δ (t − τ (x)).
By conservation of mass, we have  (25) which leads directly to the following relation between flow capacity and sweep efficiency: From this relation and the definitions of flow and storage capacities Eq. 8, we get the following alternative RTDrepresentation: Furthermore, we obtain the relation [15] E Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.