1 Introduction

Robust predictions derived from numerical modelling of underground CO2 storage in porous media rely on suitable characterisation of the reservoir, ensuring that all of the important physical processes involved are captured, and finally, accurate numerical solutions of the governing physics. Each of these essential steps can be subject to uncertainties, and often require decisions to be made to simplify the analysis in order to make the modelling tractable. The basis of these simplifying assumptions is often a judgment of the user, based on experience, or simply due to a lack of reliable data for key properties. In practice, it is usually difficult to assess the reliability of numerical simulation predictions due to uncertainties in the geological models and the difficulty in obtaining high-quality experimental evidence.

Several previous studies have considered well defined models in order to compare the accuracy of numerical simulators (Pruess et al. 2004; Class et al. 2009). Even in these cases, where there is no uncertainty in the model, results between groups can differ due to choices made by each participant, even when using the same code. These uncertainties were considered in a subsequent benchmark study on CO2 storage (Nordbotten et al. 2012), where variability between participants was categorised into several sources: the interpretation of the problem, the physical processes included, how properties are modelled through different length scales, and the numerical modelling details.

The FluidFlower international benchmark study (Nordbotten et al. 2022) provides a novel opportunity to compare the impact of these different modelling choices on the robustness of numerical predictions of a carefully controlled flow experiment that includes all of the main physical processes that may be present in underground CO2 storage operations: buoyancy, structural capillary trapping, residual trapping, dissolution and convection. In particular, numerical results will be subject to several choices by each participant. These include how the experimental apparatus is modelled, how the tracer test data is used to inform the ex-situ petrophysical properties provided, and how the physical processes included in the numerical simulators are chosen. With the experimental results providing a well-controlled “ground truth“, uncertainties in these modelling choices can be assessed on their impact on the final predictions.

In this manuscript, we detail how the modelling was approached in order to model the FluidFlower experiment. We simplify the tracer tests to a steady-state Darcy flow to allow rapid inversion of the data using the high-quality spatial concentration data. We then detail the rapid development of a finite volume simulator suitable for this modelling exercise, before presenting our “best case“ numerical predictions of the experiment. Finally, we discuss how the sparse data statistics required by the organisers was calculated using a small set of ensemble models.

2 Model Characterisation

Participants were provided with high-resolution curvature-corrected images of the FluidFlower apparatus, along with average multiphase flow properties for each of the sand types, see (Nordbotten et al. 2022) for details. Images and pressure data from tracer tests were also provided, to enable further characterisation of the model properties.

The first step in this modelling study was to produce a digital mesh suitable for numerical computations from the images provided by the organisers. Initial attempts to segment the image into distinct sand types automatically through a thresholding process were unsatisfactory, due to the poor contrast between layers in some sections (particularly where the lighting in the photographs was dim), despite attempts to enhance contrast. A semi-automated picking routine was used to delineate sedimentary horizons and faults in the model, with the results passed to some code to create distinct zones from the horizons, see Fig. 1

Fig. 1
figure 1

Digitised and interpreted facies-label (green labels) image used as the basis for the parameterisation. The labels correspond to sand type {ESF (1); C (2); D (3); E (4); F (5); G (6); Fault 1 (7); Fault 2 (8); Fault 3 (9)}. The red labelled ports 0, 1,... 5 are used in the inversion, corresponding to ports {5_3(0), 5_7(1), 9_3(2), 15_5(3), 17_7(4), 17_11(5)}, see (Nordbotten et al. 2022) for full details

It was decided at the start of this analysis to undertake all modelling in two dimensions only, due to the relative length scales of the experimental apparatus (Nordbotten et al. 2022). The FluidFlower thickness (nominal \(\approx 19\) mm) was provided at regular points in the vertical plane (Nordbotten et al. 2022). Bilinear interpolation was used to provide thicknesses at each point in the model, and this thickness was used to provide spatially-varying thickness-averaged petrophysical properties (porosity and permeability) of the form \(u(x, y) = u_0 h(x, y) / h_0\) where \(u_0\) is the property (porosity \(\phi\) or permeability k), h(xy) is the linearly-interpolated thickness at any spatial point (xy), and \(h_0 = 0.019\) m is the nominal thickness, from the single porosity and permeability values measured for each sand facies by the experimental team (Nordbotten et al. 2022).

The initial characterisation of the model by facies was refined by inversion using data from the tracer tests (Nordbotten et al. 2022). A detailed description of the inversion methodology used is provided in Tian et al. (2023), so only a brief description of this characterisation is presented here.

Two types of data from the tracer tests were provided; pressure measurements at each monitoring port for the duration of the tracer tests, as well as digital images of the injected tracer (Nordbotten et al. 2022). To assist in inversion, the tracer images were processed to produce difference images (using the background image before the tracer injections as the reference). Constant attenuation scaling was used to convert the image intensity to tracer concentration

$$\begin{aligned} c_i = \rho _t I_i, \end{aligned}$$
(1)

where \(c_i\) is the tracer concentration (−) in the \(i^{\text {th}}\) pixel, \(\rho _t = 1002\) kg.m−3 is the tracer density, and \(I_i\) is the (scaled) intensity (−) of the \(i^{\text {th}}\) pixel. Mass conservation was used to scale the concentrations to ensure a global mass balance using the (known) injection volume

$$\begin{aligned} V = Q t = \rho _t \sum _i \phi _i v_i I_i, \end{aligned}$$
(2)

where V is the injected tracer volume (m\(^3\)), Q the (constant) volumetric injection rate (m3.s−1), t is time (s), \(\phi _i\) the porosity of pixel i, and \(v_i\) its volume (area multiplied by interpolated thickness). This process was found to be reliable, with only small mass balance errors in a few individual pixels. The resulting concentration maps for each stage of the tracer tests are presented in Fig. 2.

Fig. 2
figure 2

Concentration of tracer computed from provided well test images. Each column represents one stage of the three-stage tracer test, and each row shows the evolution of the tracer plume during each stage, see (Nordbotten et al. 2022) for full details

An obvious feature of the pressure data is the instant increase from ambient pressure once each injection begins (and likewise, an instant drop back to ambient pressure once injection ceases), suggesting rapid pressure communication throughout the experimental apparatus. The characteristic time scale for pressure diffusion \(t=\frac{\phi \mu c_t L^2}{k_{\text {sc}}}\), for the FluidFlower is very short, typically 5 ms using suitable values for \(\phi =0.45\), \(\mu _w=10^{-3}\) Pa.s, \(c_t\approx 0.73 \times 10^{-9}\) Pa−1, \(L=1\) m, \(k_{\text {sc}}=6.9874\times 10^{-10}\) m2. All pressures are rescaled to dimensionless pressure using a scaling \(P_{\text {sc}} = Q_{\text {sc}} \mu _w/(k_{\text {sc}}h_0)\), with additional chosen constants \(h_0=0.025\) m (average cell thickness), \(Q_{\text {sc}}=6.25\times 10^{-7} \text{ m}^3/\)s (the water injection rate) yielding \(P_{\text {sc}}=35.778\) Pa.

For this reason, the forward modelling was approximated as steady-state on the time scale of the experiment, with the pressure corresponding to the steady state solution of the thickness–averaged 2D pressure diffusion equation for single-phase flow

$$\begin{aligned} c_t\phi h\frac{\partial P}{\partial t} - \nabla . (k_w/\mu _w)h(x, y) \nabla P=Q(x, y ,t) \end{aligned}$$
(3)

where P is the pressure (Pa), h the FluidFlower thickness (m), \(\phi\) the porosity, k the absolute permeability of the media (for water) (m2), \(\mu _w\) the water viscosity (Pa.s), and \(c_t\) the total (rock plus water) compressibility (Pa−1).

The injection rates are held relatively constant over each 30 min interval, so the overwhelming majority of the data are collected with the system equilibrated in terms of pressure given the rapid pressure communication. We therefore proceed on the basis that time-stepping the pressure diffusion equation was unnecessary, and solve the steady-state Poisson equation with Q at two different injection ports (ports 9_3, 17_7), corresponding to the tracer experiment. The resulting fixed pressure fields and velocities are then constant over each injection period, which is sufficient to compute the advection of the tracers.

The numerical model was constructed on a structured 5 mm grid (\(N_x=568\), \(N_y=300\)), with the upper boundary fixed at atmospheric pressure, and no–flow boundaries on the left, right, and base of the model.

Tracers were advected along streamlines using the fixed velocities computed from solving Eq. (3) and the upwinding scheme described in Koren (1993) for the duration of each tracer pulse. The time-stepping in the tracer computation was adjusted so the number of time steps divided the total tracer time exactly, which has the merit that any computed quantities from the tracer image are smooth differentiable functions of the parameters in the governing equation.

The cross-port pressure data provided during the tracer experiment was conspicuously noisy and clearly close to the noise floor of the instruments. Significant drift was evident in these measurements, and even the stable values showed the curious property of the amplitude not diminishing in a consistent way with the distance from the injection port. Further, the gauges were located some distance (20 cm or more) from the actual injection face and so measurements are subject to unknown frictional and other losses in the feed plumbing. This has implications in the inversion if this data is weighted very heavily, and thus limits their usefulness for inversion. By contrast, the tracer images were very clearly interpretable, rich in spatial content, and not obviously contaminated by experimental artefact of any significant kind. For these reasons, inversion proceeded using only the spatial concentration data. We consider the effect of including cross-well pressure data weighted accordingly in the inversion, see Appendix A for details.

The FluidFlower is initially filled with blue tracer. Each injection period was modelled using \(Q=2250\)ml/h (\(6.25\times 10^{-7}\text{ m}^3/\)s) for a \(3\times 30\) minute period at port 17_7 with clear tracer, then \(3\times 30\) minutes at port 9_3 with clear tracer, then \(3\times 30\) minutes at port 9_3 with blue tracer (Nordbotten et al. 2022). The forward modelling operation, which implements standard tracer advection under an upwinding scheme, is expressed below as a function \(\varvec{f}_t(\varvec{m})\) which generates concentration profiles which are very close to unity inside the swept region, and fall rapidly to zero at the tracer front. The tracer advection is stepped forward for precisely the number of time steps needed for the injection, and numerical integrations of the total tracer mass over the modelling grid at the end of the simulation agree very closely with the mass known to be injected from Q in the tracer source.

Under the assumptions of the single–phase PDE and the fast equilibration time, the experimental 30 min wait times between injections does not need to be modelled, as nothing happens in the Poisson model if the sources is switched off, since the velocities are then instantly zero. The modelled tracer positions at the end of each 30 min injection period are compared to the experimental concentration data for inversion.

The inversion of this data was couched as a Bayesian inverse problem with a likelihood \(P(\varvec{y} |\varvec{m})\) formed as a joint probability using (in general) pressure and tracer data \(\varvec{y}=\{\textbf{P}_{\text {obs}},\textbf{c}_{\text {tracer}}\}\). The model was taken to be multiplier modifiers \(m_l\) (\(l\in {\mathcal {L}}\equiv \{1,2\ldots 7,9\})\) of the permeability parameters, per facies, and applied in a “paint-by-numbers” fashion over the labelled facies model. The thickness and porosity data were considered to be sufficiently precise and experimentally stable to be fixed for the purposes of model prediction and inversion. The Bayesian framework was completed with the provision of a weak prior problem of the modifier parameters, of Gaussian form \(P(m_l)\sim N(1,\sigma ^2)\) with \(\sigma =5\) for each parameter. The associated prior covariance is \(C_p=\text{ diag }\{\sigma ^2\}\). The model point estimate at the global maximum of the posterior probability is referred to as the MAP (maximum aposteriori) inversion.

The inversion is performed using a Levenberg–Marquardt routine (Nocedal and Wright 1999; Madsen et al. 2004), which requires the Jacobian J of the forward response with respect to the unknown model parameters. Since the model dimensionality was very low and the forward model speed very fast (measured in seconds), this was computed using simple forward differences.

The negative log posterior \(E(\textbf{m})\sim -\log (P(\varvec{y} |\textbf{m})P(\textbf{m}))\) used in the optimisation step was written as a standard \(l_2\) misfit energy

$$\begin{aligned} E(\textbf{m})=E_{\text {pressure}}(\textbf{m})+E_{\text {tracer}}(\textbf{m})+E_p(\textbf{m}) \end{aligned}$$

where the cross–port pressure misfit, accumulated over only stable average measurements at ports p5.3_1, p5.7_1, p9.3_1, p15.5_1, p17.7_1, p17.11_1 is

$$\begin{aligned} E_{\text {pressure}}(\textbf{m}) = \tfrac{1}{2}\lambda _p\Vert \textbf{P}_{\text {obs}}-f_p(\textbf{m})\Vert _2^2 \end{aligned}$$

and the tracer image mismatch is written as

$$\begin{aligned} E_{\text {tracer}}(\textbf{m}) = \tfrac{1}{2}\lambda _t\Vert \textbf{c}_{\text {tracer}}-\textbf{f}_t(\textbf{m})\Vert _2^2 \end{aligned}$$

The weights \(\lambda _p,\lambda _t\) were adjusted so the tracer data are dominant in the likelihood as this data is much more abundant and artefact–free. The prior Bayesian term amounts to

$$\begin{aligned} E_p(\textbf{m})=-\log (P(\textbf{m})) \sim \tfrac{1}{2}\sum _{l\in \mathcal {L}}(m_l-1)^2/\sigma _p^2 \end{aligned}$$

and has a very benign influence on the inversion. In dimensionless units, the pressure data \(\textbf{P}_{\text {obs}}\) are O(1) numbers, but rather noisy, so setting \(\lambda _p=1\) seems appropriate. The tracer data \(\textbf{c}\) are processed from the digital images to have concentration values ranging over \(0<y<1\). Since the associated \(l_2\) norm has a very large number of voxels, \(\lambda _t\) is scaled such that the tracer misfit energy is \(E_{\text {tracer}}(\textbf{m})=1000/2\) for a model that produces no tracer concentration (\(\textbf{f}_t(\textbf{m})=\varvec{0}\)), i.e. the information content is equivalent to 1000 measurements. In practice since the volume in which experimental and forward–modelled concentrations differ is only a small fraction of the image, the misfit energy from this term ends up being O(10), perhaps equivalent to putting a 10-fold the emphasis on the tracer images as the cross-well pressure data.

Our preferred model choice was to omit the problematic pressure data, and run the inversion using the 3–injection tracer data alone. It was also considered reasonable to merge the parameters for regions 5 and 6, since region 6 is at the edge of the modelling region and will have a more fragile permeability inference. The corresponding parameter inferences are shown in Table 1, where dimensionless scaled model inference and corresponding actual unitised values are presented. The final column is the dimensionless uncertainty estimate \(\sigma _l^2=H^{-1}_{ll}\) for each parameter formed from the inverse Hessian matrix at the final optimum, where \(H=J^TJ+C_p^{-1}\) and \(C_p\) is the effective Bayesian prior covariance. The final inversion forward model images and associated data snapshots are depicted in Fig. 3, where good qualitative agreement between the model and experimental data is observed.

Table 1 Inversion results from 3–tracer model. Permeabilities provided in benchmark description (\(k_l\)) (Nordbotten et al. 2022) are multiplied by \(m_l\) to give the permeabilities used in the modelling. Note: sands F and G merged to a common (averaged) permeability
Fig. 3
figure 3

Tracer data image (top row), associated MAP forward models (middle row), and difference between data and model (bottom row). Each column represents one stage of the three-stage tracer test

3 Numerical Model for Multiphase Predictions

In order to model the FluidFlower CO\(_2\) prediction benchmarks, we solve the two-phase, two-component mass conservation equations

$$\begin{aligned} \frac{\partial }{\partial t} \left\{ \phi \sum _{\beta } S_{\beta } \rho _{\beta } \chi _{\beta }^{\kappa } \right\} + \nabla \cdot \varvec{F}^{\kappa } - q^{\kappa } = 0 \end{aligned}$$
(4)

where \(S_{\beta }\) is the saturation of phase \(\beta\) (either liquid or gas in this case), \(\rho _{\beta }\) is the density of phase \(\beta\) (kg m−3), \(\chi ^{\kappa }_{\beta }\) is the mass fraction of fluid component \(\kappa\) (either water or CO\(_2\) in this case) in phase \(\beta\), \(\varvec{F}^{\kappa }\) is the mass flux of fluid component \(\kappa\) (kg.m\(^{-2}\).s−1), and \(q^{\kappa }\) is an external source/sink (kg m−3 s−1).

The fluid flux is a sum of advective and diffusive fluxes (no hydrodynamic dispersion is included in this model):

$$\begin{aligned} \varvec{F}^{\kappa } = -\sum _{\beta }\left( \rho _{\beta } \frac{\varvec{k} k_{\text{r}, \beta }}{\mu _{\beta }} \left( \nabla P_{\beta } - \rho _{\beta } \varvec{g}\right) + \phi \rho _{\beta } d_{\beta }^{\kappa } \nabla \chi _{\beta }^{\kappa } \right) , \end{aligned}$$
(5)

where \(\varvec{k}\) is the absolute permeability tensor (m\(^2\)), \(k_{\text{r},\beta }\) is the relative permeability of phase \(\beta\) (-), \(\mu _{\beta }\) is the dynamic viscosity of phase \(\beta\) (Pa.s), \(P_{\beta }\) is the pressure of each phase (Pa), \(\varvec{g}\) is gravity (m.s\(^{-2}\)), and \(d_{\beta }^{\kappa }\) is the molecular diffusion coefficient of component \(\kappa\) in phase \(\beta\) (m\(^2\).s−1).

The mass balance is closed by noting

$$\begin{aligned} \sum _{\beta } S = 1, \qquad \sum _{\kappa } \chi _{\beta }^{\kappa } = 1 \,\, \forall \beta , \end{aligned}$$
(6)

and that the phase pressures are related by the capillary pressure \(P_\text{c}\)

$$\begin{aligned} P_\text{c} = P_g - P_l, \end{aligned}$$
(7)

where \(P_g\) and \(P_l\) are the gas and liquid phase pressures, respectively.

Although hydrodynamic dispersion would be expected to influence the geometry of downwelling fingers during the anticipated convective mixing (amongst other factors that will also affect the size and position of the fingers, such as grid resolution), it is not expected to influence the mass flux of CO2 from the gas phase to the liquid phase (Liang et al. 2018), which is the primary metric that we are concerned with.

The choice of capillary pressure and relative permeability models to implement is often informed by experimental values. In this case, in the absence of detailed experimental curves for these quantities, we chose to use the common Brooks-Corey forms for both with only experience for justification (Brooks and Corey 1966)

$$\begin{aligned} P_\text{c} = P_e S_{\text {eff}}^{-1/\lambda }, \end{aligned}$$
(8)

where \(P_e\) is the capillary threshold entry pressure, \(\lambda\) is the Brooks-Corey exponent, and

$$\begin{aligned} S_{\text {eff}} = \frac{S_l - S_{l,r}}{1 - S_{l,r}}, \end{aligned}$$
(9)

where \(S_{l,r}\) is the irreducible saturation of the liquid phase. The Brooks-Corey relative permeability model is (Brooks and Corey 1966)

$$\begin{aligned} k_{\text{r},l}&= k_{l0} S_{\text {eff}}^{c_l}, \end{aligned}$$
(10)
$$\begin{aligned} k_{\text{r},g}&= k_{g0} (1 - S_{\text {eff}})^2 \left( 1 - S_{\text {eff}}^{(2 + c_{g})/c_{g}}\right) , \end{aligned}$$
(11)

where \(k_{l0}\) and \(k_{g0}\) are the end point relative permeabilities, and \(c_l\) and \(c_{g}\) are coefficients. In all of the modelling, we used the capillary entry pressures and endpoint saturations provided for each sand type (Nordbotten et al. 2022), and constant exponents \(\lambda = 2\), \(c_l = c_g = 2\).

To solve the governing equations, we use the open-source multiphysics code MOOSE (Lindsay et al. 2022; Green et al. 2018; Wilkins et al. 2020, 2021). The governing equations are discretised in space using the finite volume method with a two-point flux approximation by default (although extended stencils are available in MOOSE if desired). A structured two-dimensional mesh is used, with each cell being 0.01m by 0.01m to match the discretisation required for reporting the spatial distribution of CO2. Thickness-averaged porosity and permeability (where permeability is calculated through inversion described above) are used, such that the porosity and permeability in the centre of the FluidFlower are higher than at the edges to account for deformation of the perspex face, see Sect. 2. A constant molecular diffusion coefficient for CO2 in the liquid phase, \(d = 2 \times 10^{-9}\) m\(^2\).s−1, was used for all facies.

The finite volume discretisation is inherently conservative, and hence mass conservation is assured. Flux across the face of adjoining elements is calculated using linear interpolation of face-centred flux gradients, which makes the default implementation second-order accurate in space (Moukalled et al. 2016). Phase mobility \(\rho _{\beta } k_{\text{r},\beta } \varvec{k} / \mu _{\beta }\) is upstream weighted (upwinded). Various temporal discretisations are available as part of the MOOSE framework, allowing higher-order time stepping if required. In this case, the governing equations are solved in a fully implicit manner using implicit Euler timestepping.

The finite volume method is a relatively new addition to MOOSE (Lindsay et al. 2022). Previous code for flow in porous media that has been implemented within MOOSE (Green et al. 2018; Wilkins et al. 2020, 2021) has used the finite element method. However, as the flow here was expected to be strongly dependent on capillary barriers, this study presented an opportunity to implement porous flow using a finite volume discretisation within MOOSE that would enable easy handling of sharp discontinuities in gas saturation at facies boundaries. Using the automatic differentiation capability within MOOSE (Lindsay et al. 2021), code development for this problem was rapid. Indeed, most of the coding was completed within only a few days. As with previous efforts for modelling flow in porous media using MOOSE, this capability has been open-sourced and is therefore available to all users.

Fluid properties are computed using high-accuracy equations of state. Water density is computed using the IAPWS Industrial Formulation 1997 (IAPWS 2007), while water viscosity is calculated using the IAPWS 2008 formulation (IAPWS 2008). The density of CO2 is calculated using the Span and Wagner equation of state (Span and Wagner 1996), with viscosity calculated using (Fenghour et al. 1998). These choices are for both accuracy, with each equation of state widely used in practice for their demonstrated accuracy over a wide range of pressure and temperature ranges, as well as convenience, being already available in MOOSE.

The Span and Wagner formulation for CO2 uses density and temperature as the primary variables with which to calculate properties such as density, enthalpy and internal energy. In order to use pressure and temperature as the primary variables, it is necessary to calculate density by iteration. As this can be a bottleneck in the computations, a tabulated version of this equation of state is used for speed. This is several of orders of magnitude faster than using the high-accuracy equations of state, and is comparable in time to simpler approximations of fluid properties.

The partitioning of CO2 and water in each phase is computed using a high-precision brine-CO2 equation of state that calculates the mutual solubility of CO2 into the liquid brine and pure water into the CO2-rich gas phase using an accurate fugacity formulation (Spycher et al. 2003; Spycher and Pruess 2005, 2010). As with many reservoir simulators, it is assumed that the phases are in instantaneous equilibrium, meaning that dissolution of CO2 into the brine and evaporation of water into the CO2 phase happens instantaneously.

The density of the aqueous phase with the contribution of dissolved CO2 is calculated using a thermodynamically-consistent mixing rule for partial molar volumes

$$\begin{aligned} \frac{1}{\rho } = \frac{1 - \chi _{CO2}}{\rho _b} + \frac{\chi _{CO2}}{\rho _{CO2}}, \end{aligned}$$
(12)

where \(\rho _b\) is the density of brine, \(\chi _{CO2}\) is the mass fraction of CO2 dissolved in the aqueous phase, and \(\rho _{CO2}\) is the partial density of dissolved CO2 (Garcia 2001).

As water vapour is only ever a small component of the gas phase in the temperature and pressure ranges considered, the density of the gas phase is assumed to be simply the density of CO2 at the given pressure and temperature.

No contribution to the viscosity of each phase due to the presence of CO2 in the aqueous phase or water vapour in the gas phase is included. As a result, the viscosity of the aqueous phase is simply the viscosity of brine, while the viscosity of the gas phase is the viscosity of CO2.

A persistent set of primary variables that remain independent in all phase states is used for this miscible multiphase flow problem. In this approach, the primary variables are pressure of a reference phase \(\beta\), \(P_{\beta }\), and total mass fraction of a fluid component summed over all phases

$$\begin{aligned} z^{\kappa } = \frac{\sum _{\beta } S_{\beta } \rho _{\beta } \chi _{\beta }^{\kappa }}{\sum _{\beta } S_{\beta } \rho _{\beta }}. \end{aligned}$$
(13)

Using this set of persistent primary variables, saturation is calculated using a compositional flash, after which all fluid properties can be computed.

4 Multiphase Flow Predictions

A dominating feature of this benchmark problem, independent of the visual appeal of the ‘representative model’ of typical geology at expected operating depths of \(>900\) m, is that the operating conditions are near–surface. One immediate consequence is that the density ratio between the gaseous CO2 and the water was a factor of approximately 500, which is several orders of magnitude greater than that for depths commonly considered in practice, where the densities of supercritical CO2 and formation water differ by less than a factor of two. Combined with the relatively high sand permeabilities and low injection rate which result in a small viscous pressure gradient during injection, it follows that migration of the gas phase CO2 is primarily driven by buoyancy, while the spatial distribution of this phase is mainly controlled by the capillary entry pressures of each sand unit (and to a lesser extent the permeability contrast between sand units). The Capillary number \(Ca = \mu u / \gamma\), where u is the Darcy velocity (m.s−1) of the gas-phase and \(\gamma\) is the interfacial surface tension (\(\gamma \approx 3 \times 10^{-2}\) N.m−1), is estimated to be \(< O(10^{-6})\). This problem is therefore amenable to modelling by invasion percolation (eg. Cavanagh and Haszeldine 2014; Trevisan et al. 2017). Indeed, one of the first attempts we made of modelling the plume of CO2 was through a simple invasion percolation model which would run in less than a second. This provided a first-order sanity check for the problem, where we could see that CO2 injected in the bottom port would fill the structure beneath the sealing unit before spilling up the lower fault, for example.

An unwelcome and perverse consequence of the density contrast was the numerical difficulty in modelling this experiment. As the density of CO2 in the gas phase is so small, cells where a gas phase appears (when the amount of CO2 present in the cell exceeds the equilibrium solubility in water) necessitates extremely small timesteps, as we now explain through a simple example. If we consider the cells in the immediate vicinity of an injection port, CO2 saturates the liquid phase and a gas phase appears. However, as the density of CO2 at the operating conditions is less than 2 kg.m−3, any gas phase rapidly saturates the cell such that the gas saturation rises sharply in those cells, resulting in steep gradients in gas saturation between adjacent cells. All numerical solvers that we tried found it difficult to converge when this occurred, especially when multiple adjacent cells changed from single phase liquid to two phase liquid and gas. In order to reduce the numerical residual sufficiently, the timestep needed to be cut sharply, which slowed the overall simulation time. Through trial and error, we found that the fastest simulation time could be achieved by taking timesteps of only 2 s during the injection phase, which could be increased to 60 s during the post injection phase.

Due to the small timesteps required, the runtime of each model was greater than a day. As the models had only \(\approx\) 40,000 cells, using multiple processors could only slightly mitigate the time required due to the small timesteps, as interprocessor communication quickly negated any speedup gained by parallelisation. This meant that the computer wall time was greater than the timesteps within the simulation during the injection phase, which is a significant difference to the typical CO2 modelling work that is performed, where wall time must be several orders of magnitude smaller than the simulation timesteps in order for modelling underground storage for hundreds of years to be feasible.

As the onset time for convective mixing was expected to be short given the high permeability of the sands  (Emami-Meybodi et al. 2015), convection was expected to be significant, especially in the regions beneath the sealing units. Though instabilities due to numerical rounding will eventually lead to convection even in a homogenous sand, the onset of convection can be delayed. In practice, the size and distributions of the sand grains within each sand unit will vary slightly, so that the sand unit is not entirely homogeneous. To avoid any delayed convection in the numerical simulations, a small random noise sampled from a uniform distribution of \(\pm 0.01 k\) was added to the permeability of each sand unit.

The results from the numerical model are presented in Fig. 4. This figure shows both CO2 in the gas phase (white), and CO2 dissolved in the liquid phase (grey). Carbon dioxide injected in the lower port rises due to density difference and spreads beneath the sealing unit. This structure fills until CO2 spills into the heterogeneous fault in the lower left corner, whereby it rapidly rises, fills beneath a sand unit of higher threshold entry pressure, before the capillary pressure increase leads to breakthrough. The CO2 then flows upwards, until it reaches the upper sealing unit. It then begins to fill beneath this seal until injection stops, see the upper left-hand part of Fig. 4. A second CO2 plume is created during injection in the upper injection port. This CO2 rises and spreads beneath the upper sealing unit. Some CO2 reaches the high-permeability fault in this part, and some spreading along the two fingers of this fault is observed, as per the upper left-hand part of Fig. 4.

At the end of the five hour injection period, approximately 70% of the total injection CO2 is in the gas phase, with the remaining CO2 dissolved in the liquid phase. This can be seen in the upper-left part of Fig. 4, where most of the plume contains gas-phase CO2. A small amount of dissolved CO2 (grey) can be seen to diffuse beneath both the top and bottom plumes, and some dissolved CO2 is observed to be diffusing into the sealing unit atop the bottom plume. Note that within the plume where gas phase CO2 is present, the liquid in this region is fully saturated with CO2, including the region in the vicinity of the injection ports (due to the instantaneous phase equilibrium assumption inherent in the equation of state used).

Fig. 4
figure 4

Evolution of the CO2 distribution over time. Black represents no CO2, white is CO2 in the gas phase, and grey is dissolved CO2

At the end of the first day, fingering is observed beneath the upper and lower plumes, see the upper-right part of Fig. 4, as the slightly denser CO2 saturated water moves downwards. Significant slumping at both injection ports is also present, with all CO2 near the ports having dissolved. The dissolved CO2 near the lower injection port has reached the base of the FluidFlower and has begun spreading laterally, while the dissolved CO2 near the upper injection port has reached the top of the lower sealing unit and begun to spread outwards, with only a small amount diffusing into the seal. We also note that all of the CO2 that migrated up the lower fault into the upper-left part of the model has dissolved at the end of the first day, with the plume of dissolved CO2 flowing down the fault and just reaching the base of the model after 24 h.

Over the remaining days of the simulation, CO2 in the gas phase continues to dissolve into the water, as the downwelling fingers of CO2-rich water grow and merge, and unsaturated water is drawn upwards towards the base of the two plumes. The interplay between the downwelling fingers and the unsaturated water drawn upwards results in lateral movement of the fingers, as observed in Fig. 4. The fingers formed beneath the lower plume reach the base of the model after three days, see Fig. 4. After four days, only a small amount of CO2 in the gas phase is left in the model, with all CO2 dissolved by the end of the fifth day.

One interesting feature of the results presented in Fig. 4 is that it appears that most of the CO2 that invades the lower sealing unit is CO2 that enters from above, that is, CO2 that is injected in the upper port above the sealing unit, rather than CO2 from the lower injection port. As the CO2 above the sealing unit dissolves, the density of the saturated water increases slightly. This saturated water sinks and spreads along the sealing unit, whereby dissolved CO2 diffuses into the sealing unit. This is especially evident in the final snapshot at the end of five days shown in the lower-right part of Fig. 4.

These numerical results show good qualitative agreement with the experimental results (Fernø et al. 2023). A detailed quantitative comparison between the metrics reported in the sparse data specified in the benchmark description (Nordbotten et al. 2022) is presented in Flemisch et al. (2023), so no comparison is provided in this manuscript for brevity.

5 Conclusion

This paper describes our approach to modelling the novel FluidFlower International Benchmark experiment. For model calibration or inference, sand permeabilities scale factors were optimised to match modelled tracer data to digitised tracer concentration images. The forward model in the fitting phase employed a simplified steady-state Poisson model to minimise numerical cost. The multiphase flow experiment was then modelled using a rapidly developed finite volume capability in the open-source simulation framework MOOSE. A small set of ensemble simulations was undertaken to explore key sensitivities and to compute the statistical metrics required for the benchmarking exercise. The spatial distribution of CO2 predicted in the numerical models showed good qualitative agreement with the suite of experimental results, and key metrics show good quantitive agreement with experiments for many of the metrics.