1 Introduction

\(\hbox {CO}_2\) capture and subsequent geologic carbon sequestration (GCS) is a climate-change mitigation technology that can be deployed at scale to offset anthropogenic \(\hbox {CO}_2\) emissions during the energy transition (Marcucci et al. 2017; European Academies Science Advisory Council (EASAC) 2018; Celia 2021; Intergovernmental Panel on Climate Change (IPCC) 2022). In GCS, reservoir simulation, including coupled flow and geomechanics, is the primary tool used to assess and manage geologic hazards such as fault leakage (e.g., Caine et al. 1996; Ingram and Urai 1999; Nordbotten and Celia 2012; Zoback and Gorelick 2012; Juanes et al. 2012; Jung et al. 2014; Vilarrasa and Carrera 2015; Saló-Salgado et al. 2023) and induced seismicity (e.g., Cappa and Rutqvist 2011; Zoback and Gorelick 2012; Juanes et al. 2012; Ellsworth 2013; Verdon et al. 2013; Alghannam and Juanes 2020; Hager et al. 2021). In response to the inherent uncertainties associated with modeling and simulation of \(\hbox {CO}_2\) storage (Nordbotten et al. 2012), building confidence in the forecasting capabilities of simulation models requires calibration (or, synonymously, history matching), a process that involves updating the reservoir model to match field observations as they become available (Oliver and Chen 2011; Doughty and Oldenburg 2020).

History matching is an ill-posed inverse problem (Oliver and Chen 2011). This means that multiple solutions (i.e., parameter combinations) exist that approximate the data equally well. Automated techniques such as Markov chain Monte Carlo, randomized maximum likelihood or ensemble-based methods can be used to quantify uncertainty in history-matched models, especially in combination with surrogate models to reduce forward model computational time (see Aanonsen et al. 2009; Oliver and Chen 2011; Jagalur-Mohan et al. 2018; Jin et al. 2019; Liu and Durlofsky 2020; Santoso et al. 2021; Landa-Marbán et al. 2023, forthcoming, and references therein). In practice, however, it may be difficult to ensure that the chosen simulation model provides the best possible forecast. This is due to different subsurface conditions, the inability to include all sources of uncertainty in the models, incomplete field data and limited time for history matching.

In the laboratory, intermediate-scale (\(\sim\)meter) experiments have been used to study the physics of petroleum displacement (e.g., Gaucher and Lindley 1960; Brock and Orr 1991; Cinar et al. 2006) and contaminant transport (e.g., Silliman and Simpson 1987; Wood et al. 1994; Lenhard et al. 1995; Fernández-García et al. 2004). Similar 2D and 3D flow rigs have recently been applied to \(\hbox {CO}_2\) storage, providing a link between core-scale measurements and field observations:

Kneafsey and Pruess (2010) found the impact of convective dissolution to be significant, using a page-size Hele–Shaw cell and numerical simulations. Neufeld et al. (2010) studied the scaling of convective dissolution and found it to be an important mechanism in the long-term trapping of injected \(\hbox {CO}_2\) in an idealized site. Wang et al. (2010) used a 3D setup to investigate the ability of electrical resistivity tomography to identify localized leaks. Trevisan et al. (2014, 2017) focused on the impact of structural and residual trapping. In homogeneous sands, they found that previous trapping models, such as the Land (1968) model, can approximate the residually trapped gas saturation (\({R}^2 > 0.6\)). Studying an heterogeneous aquifer characterized by a log-normal distribution of six different sand facies, they report that trapping efficiency increased significantly due to structural trapping. A strong control of sand heterogeneity on upward migration of \(\hbox {CO}_2\) was also found by Lassen et al. (2015). Krishnamurthy et al. (2019, 2022) devised a novel technique to automate the process of beadpack/sandpack deposition and generate realistic depositional fabrics; they concluded that grain-size contrast and bedform architecture significantly impact \(\hbox {CO}_2\) trapping. Subsequently, Ni et al. (2023) presented modified invasion-percolation simulations and reported that bedform architecture can impact \(\hbox {CO}_2\) saturation if enough grain-size contrast is present. Askar et al. (2021) used a \(\sim\)8-m-long tank to test a framework for GCS monitoring of \(\hbox {CO}_2\) leakage. These studies employed homogeneous glass beads or sands, or focused on heterogeneities and bedform architectures in the aquifer layer; structural complexity was minimal.

In this paper, we use quasi-2D, intermediate-scale experiments of \(\hbox {CO}_2\) storage to evaluate, quantitatively, the forecasting capability of history-matched simulation models against well-defined spatial data. An attempt was made to recreate realistic basin geometries, including stacking of storage reservoirs, faults, caprock and overburden. We simulate each of the three presented experiments with three versions of a numerical model, each with increasing access to local petrophysical measurements. These different versions are denoted Model 1 (\(M_1\)), Model 2 (\(M_2\)) and Model 3 (\(M_3\)). This allows us to assess (1) the value of local information of the system, expressed in terms of sand petrophysical measurements, during history matching, and (2) transferability or forecasting capability of our matched simulation models, when tested against a different experiment. The term concordance is used to evaluate agreement between experiments and simulations (Oldenburg 2018).

2 Physical Experiments

The physical experiments of \(\hbox {CO}_2\) injection are conducted using the FluidFlower rigs. These rigs are meter-scale, quasi-2D tanks with transparent Plexiglass panels designed and built in-house at the University of Bergen (Fig. 1). Here, we used two tanks, with dimensions \(89.9\times 47\times 1.05\) cm and \(2.86\times 1.3\times 0.019\) m (referred herein to as Tank 1 and Tank 2, respectively). Different geologic settings are constructed by pouring unconsolidated sands with desired grain sizes into the water-saturated rigs. The rigs have multiple ports which allow flushing out fluids after a given \(\hbox {CO}_2\) injection, such that multiple injections can be conducted in the same setting. The location of the ports can be adjusted to accommodate different injection scenarios. A variety of techniques have been developed by UiB engineers in order to build complex structures such as folds and faults.

Fig. 1
figure 1

Overview of the FluidFlower rigs and porous media used in the physical experiments. a Medium FluidFlower rig (Tank 1). b Snapshot during sand pouring to build the porous medium used in Experiments A1 and A2 in Tank 1 (Haugen et al. 2023, this issue). c Front view of porous medium in Tank 1, with lithologies in white and injector location shown with a red star. The length and height correspond to the porous medium. Note the fixed water table at the top. d Overview of the main FluidFlower rig (Tank 2), showing the back panel with sensor network. e Porous medium in Tank 2, used for Experiment B1, with lithologies in white. Location of injectors and Boxes A, B and C for analysis are shown with a red star and gray boxes, respectively. Length and variable height correspond to the porous medium

Below, we summarize the petrophysical measurements, experimental setup, geologic model/porous media construction and experimental schedule. Details on the conceptualization of the FluidFlower rigs and technical information are given in  Fernø et al. (2023, this issue) and Eikehaug et al. (2023, this issue), while the full description of the physical experiment in Tank 1 and ex situ measurements are provided by Nordbotten et al. (2022); Haugen et al. (2023, this issue). Further details on the experiment in Tank 2, as well as results of the international benchmark study (IBS), are provided by Flemisch et al. (2023, this issue).

2.1 Sand Petrophysical Properties

Measurements on the employed Danish quartz sands were conducted using specialized equipment to determine average grain size (d), porosity (\(\phi\)), permeability (k), capillary entry pressure (\(p_\text {e}\)) and drainage and imbibition saturation endpoints (denoted as connate water saturation, \(S_\text {wc}\), and trapped gas saturation, \(S_\text {gt}\)). The methodology is described by Haugen et al. (2023, this issue), and obtained values are provided in Table 1. Sands C, D, E and F are very well sorted, sand G is well sorted, and sand ESF is moderately sorted (Haugen et al. 2023, this issue). We verified that Darcy’s law is applicable in our system using the Reynolds number (\(R_e\)):

$$\begin{aligned} R_\textrm{e} = \frac{u d}{\nu } \end{aligned}$$
(1)

where u is the fluid discharge per unit area, d the mean grain diameter, and \(\nu\) the kinematic viscosity of the fluid. From our simulation results, matched to experimental observations, \(\max (R_\textrm{e})\le 1\), which ensures the applicability of Darcy’s law (e.g., Bear 1972).

Table 1 Petrophysical properties for used quartz sands, as obtained from local, ex situ measurements

2.2 Experimental Setup

The front and back panels of the FluidFlower are mounted on a portable aluminum frame, such that boundaries are closed on the sides and bottom (no flow). The top surface is open and in contact with fluctuating atmospheric pressure (Fig. 1). A fixed water table above the top of the porous medium was kept throughout the experiments conducted here. The experimental setup incorporates mass flow controllers to inject gaseous \(\hbox {CO}_2\) at the desired rate, and a high-resolution digital camera with time-lapse function (Haugen et al. 2023, this issue).

Experiments were conducted in 2021 and 2022 in Bergen (Norway) at room temperature (\(\approx 23\,^\circ\)C) and ambient atmospheric pressure. Temperature changes were minimized as much as possible, but maintaining a constant temperature was not possible in the available laboratory space. The fluids and sands were set in the FluidFlowers using the following procedure:

  1. 1.

    The silica sands are cleaned using an acid solution of water and HCl to remove carbonate impurities.

  2. 2.

    The FluidFlower rig is filled with deionized water.

  3. 3.

    Sands are manually poured into the rig using the open top boundary, in order to construct the desired porous medium.

  4. 4.

    A pH-sensitive, deionized-water solution containing bromothymol blue, methyl red, hydroxide and sodium ions is injected through multiple ports until the rig is fully saturated. This enables direct visualization of \(\hbox {CO}_2\) gas (white), dissolved \(\hbox {CO}_2\) (yellowish orange to red), and pure water (dark teal).

  5. 5.

    5.0 purity (99.999%) \(\hbox {CO}_2\) is injected as gaseous phase at the desired rate. \(\hbox {CO}_2\) is injected through dedicated ports directly into the rig (Fig. 1).

  6. 6.

    After the injection phase, injection ports are closed and \(\hbox {CO}_2\) migration continues.

  7. 7.

    Once the experiment is finished, the rig can be flushed with deionized water and the process can start again from step 4.

Full details on the fluids are given in Fernø et al. (2023, this issue) and Eikehaug et al. (2023, this issue). Below, we refer to the pH-sensitive solution in the rigs as “dyed water".

2.3 Porous Media Geometries

The geometries of the porous media used in this paper aim to recreate the trap systems observed in faulted, siliciclastic, petroleum-bearing basins around the world, given the geometrical constraints of the FluidFlowers and manual sand pouring (Fernø et al. 2023; Eikehaug et al. 2023, this issue). Features such as folds, faults and unconformities were built in both Tanks 1 and 2. The construction of faults, shown in Fig. 1b and detailed in  Haugen et al. (2023, this issue), requires a minimum effective “fault-plane” thickness; hence, our fault structures are thicker than natural faults with the same displacement (Childs et al. 2009). Fine sands (\(d\approx 0.2\) mm) are used to represent sealing or caprock formations.

The geometry in Tank 1 (Fig. 1c) contains three main high-permeability reservoirs (F sand). The bottom and middle F sand are separated by a seal (ESF sand), while the middle and top are separated by the C sand and connected through a higher permeability fault (refer to Sect. 2.1 for petrophysical properties). The fault separates the bottom section into two compartments. The bottom and top F sand provide anticlinal traps for the \(\hbox {CO}_2\) to accumulate in.

The geometry in Tank 2 (Fig. 1e) was specifically motivated by the structure of North Sea reservoirs and petroleum basins. From bottom to top, it contains two sections of decreasing-permeability reservoirs capped by two main sealing layers. A fault separates the bottom section into two compartments, while two faults separate the top section into three compartments. Each fault has different petrophysical properties: The bottom fault is a heterogeneous structure containing ESF, C, D, F and G sands, the top-left fault is an impermeable structure made of silicone and the top-right fault is a conduit structure containing G sand.

2.4 Experimental Injection Schedule

The injection schedules for experiments in Tanks 1 and 2 are provided in Table 2. Injection ports have an inner diameter of 1.8 mm.

Table 2 Schedules for the three \(\hbox {CO}_2\) injection experiments simulated in this work

3 Numerical Simulations

3.1 Model Setup

The isothermal simulations presented in this work were performed with the MATLAB Reservoir Simulation Toolbox, MRST (Krogstad et al. 2015; Lie 2019; Lie and Møyner 2021). Specifically, we used the black-oil module, which is based on fully implicit solvers with automatic differentiation, and assigned properties of water to the oleic phase, such that the gaseous phase (\(\hbox {CO}_2\) only) can dissolve in it. Vaporization of water into the gas phase and chemical reactions are not considered, because they are not primary controls on fluid migration for our operational setup and analysis time.

In addition to structural and dissolution trapping, we also considered residual trapping (Juanes et al. 2006) to be consistent with local measurements showing nonzero trapped gas saturation (Sect. 2.1). This is achieved through hysteretic relative permeability curves for the nonwetting (gas) phase (see Sect. 3.2). Our implementation in MRST follows ECLIPSE’s technical description (Schlumberger 2014), and Killough’s (1976) model is used to compute the scanning curves (Saló-Salgado et al. 2023, forthcoming). Physical diffusion was also included through the addition of a diffusive flux term with a scalar, constant coefficient in the computation of the total \(\hbox {CO}_2\) flux (Bear 1972).

The simulator requires very small time-steps (seconds to minutes) due to the buoyancy of \(\hbox {CO}_2\) at atmospheric conditions and high sand permeabilities (Table 1). Linear solver time was reduced by means of AMGCL (Demidov and Rossi 2018; Lie 2019), an external, pre-compiled linear solver. The greatest challenge was the convergence of the nonlinear solver, which required many iterations and time-step cuts. This is consistent with the groups working in the FluidFlower international benchmark study (Flemisch et al. 2023, this issue).

Next, we describe the computational grids for experiments in Tanks 1 and 2, PVT properties and boundary conditions. Petrophysical properties are specific of each model version and are detailed in Sect. 3.2.

3.1.1 Computational Grids

A front panel image of the porous medium was used to obtain layer contact coordinates through a vector graphics software (Fig. 2a). These contacts were then imported into MATLAB to generate the computational grids using the UPR module (Berge et al. 2019, 2021) (Fig. 2b, d). The grids were generated in 2D and then extruded to 3D (using a single cell layer) to account for thickness and volume. Note that, in Tank 1, where the porous medium has dimensions of \(89.7\times 47\times 1.05\) cm, the thickness (space between the front and back panels) is constant (10.5 mm). Tank 2, which is significantly larger (porous medium dimensions \(2.86\times 1.3\times 0.019\) m), has a thickness of 19 mm at the sides; however, it varies towards the middle due to forces exerted by the sand and water, to a maximum of 28 mm. A thickness map obtained after initial sand filling was used to generate our variable-thickness mesh via 2D interpolation (Fig. 2c). Also, the top surface of the porous medium is not flat (height = \(130 \pm 3\) cm).

Fig. 2
figure 2

Simulation grids overview. a Front-panel view of Tank 1, where the layer contacts have been highlighted in white. b Front view of simulation grid for experiments in Tank 1, with lithologies indicated and colored based on petrophysical properties (see Sect. 3.2). Location of injection wells is shown in red. c Thickness map of simulation grid for experiments in Tank 2. d Front view of simulation grid for experiments in Tank 2, with lithologies indicated and colored based on petrophysical properties. Location of injection wells is shown in red

Our composite Pebi grids (Heinemann et al. 1991) have a Cartesian background and are refined around face constraints (contacts and faults) as well as cell constraints (injection wells) (Berge et al. 2019, 2021). We generated multiple grids to test the finest grid we could afford to simulate Experiment B1 in Tank 2 with. Our grid has a cell size \(h\approx 5\) mm and 151,402 cells (Fig. 2d). The grid used for Tank 1 has a similar cell size (\(h\approx 4\) mm and 27,200 cells), which was chosen to reduce grid-size dependencies when applying our matched models to Experiment B1.

3.1.2 PVT Properties

Consistent with experimental conditions, our simulations are conducted at atmospheric conditions (\(T = 25\) C), where the \(\hbox {CO}_2\) is in gaseous state. We employed a thermodynamic model based on the formulations by Duan and Sun (2003) and Spycher et al. (2003); Spycher and Pruess (2005) to calculate the composition of each phase as a function of p, T. The implementation for a black-oil setup is described in Hassanzadeh et al. (2008) and references therein. Given the boundary conditions (Sect. 3.1.3) and dimensions of our experimental porous media, pore pressure changes (\(\Delta p\)) are very small in our simulations (max \(\Delta p \ll 1\) bar). Hence, the fluid properties remain similar to surface conditions, where the water and \(\hbox {CO}_2\) have, respectively, a density of 997 and 1.78 kg/\(\hbox {m}^3\), and a viscosity of 0.9 and 0.015 cP. The maximum concentration of \(\hbox {CO}_2\) in water is \(\approx 1.5\) kg/\(\hbox {m}^3\).

3.1.3 Initial, Boundary and Operational Conditions

Our porous media are fully saturated in water at the beginning of \(\hbox {CO}_2\) injection. No-flow boundary conditions were applied everywhere except at the top boundary, which is at constant pressure and includes a fixed water table a few cm above the top of the porous medium. Injection is carried out via wells completed in a single cell at the corresponding coordinates. The diameter of injection wells is 1.8 mm in both Tank 1 and Tank 2, which operate at a constant flow rate (see Sect. 2). The simulation injection schedule follows the experimental protocol, provided in Table 2. Note that injection rates in our simulations of Experiment A1 and A2 were slightly adjusted during the calibration procedure, as explained in Sects. 3.3 and 4.

3.2 Simulation Model

Three different model versions, denoted Model 1 (\(M_1\)), Model 2 (\(M_2\)) and Model 3 (\(M_3\)), are used throughout this study to evaluate the value of local data in forecasting subsurface \(\hbox {CO}_2\) migration. Each successive model was constructed based on access to an increasing level of local data, with \(M_1\) having access to the least data and \(M_3\) having access to the most data. The model-specific parameters are limited to the following:

  • Petrophysical properties (porosity, permeability, capillary pressure and relative permeability), which depend on available local data and are described in this section.

  • The molecular diffusion coefficient (D). Models 1–3 were calibrated using the same value, \(D = 10^{-9}\,\hbox {m}^2\)/s. Additionally, Model 3 was also calibrated with \(D = 3\times 10^{-9}\,\hbox {m}^2\)/s. Accordingly, where required we denote Model 3 as \(M_{3,1}\) and \(M_{3,3}\).

  • Injection rate. Experiments in Tank 1 were conducted at a very low injection rate (\(I_\text {R} = 2\) ml/min, see Table 2). Given that the mass flow controllers used in Tank 1 may be inaccurate for this rate, the injection rate was also modeled as an uncertain parameter. Model calibration was achieved with \(I_\text {R} \in [1.6,\, 1.8]\) ml/min for all three models.

All other model characteristics, including the grid and numerical discretization, remain unchanged. Below, we describe the starting petrophysical values for each of our three simulation models. Note that the experimental geometry in Tank 1, used for matching, only contained sands ESF, C, E and F. Properties for sands D and G are also provided because they were required to simulate the experiment in Tank 2 (Fig. 1).

3.2.1 Model 1 (\(M_1\))

For this model, local petrophysical data were limited to a measure of the average grain size (d; see Sect. 2.1 and Table 1). Hence, petrophysical properties were estimated from published data in similar silica sands. Porosity was selected from data in Beard and Weyl (1973) and Smits et al. (2010) for moderately to well-sorted sands. Permeability was obtained from fitting a Kozeny–Carman model to data in Beard and Weyl (1973) and Trevisan et al. (2014). The resulting equation has the form \(k = \beta d^2\phi ^3\), where \(\beta\) equals 12,250 in our fit with d in mm and k in D. Obtained porosity and permeability values are provided in Table 3.

Table 3 Initial porosity and permeability for Model 1

Capillary pressure curves were computed as described below:

  1. 1.

    Capillary pressure measurements in a similar system were obtained from the literature. In this case, Plug and Bruining (2007) measured capillary pressure curves on the unconsolidated quartz sand-\(\hbox {CO}_2\)-distilled water system at atmospheric conditions. We used their measurements on sand packs with an average particle size between 0.36 and 0.41 mm, which are closest to the C sand in our experiments (Fig. 3a).

  2. 2.

    A Brooks and Corey (1964) model of the form \(p_\text {c} = p_\text {e}(S_\text {w}^*)^{-\frac{1}{\lambda }}\) was fitted to these data, where \(p_\text {e}\) is the nonwetting phase entry pressure at \(S_\text {w} = 1\), \(\lambda = 2.6\) and \(S_\text {w}^* = \frac{S_\text {w} - S_{\text {wc}}}{1 - S_{\text {wc}}}\) is the normalized water saturation with irreducible or connate water saturation \(S_\text {wc}\). This fit led to our reference curve, \(p_\text {cr}\) (Fig. 3a).

  3. 3.

    The capillary pressure depends on the pore structure of each material, such that sands with different grain sizes require different \(p_\text {c}\) curves. The capillary pressure variation can be modeled by means of the dimensionless J-function proposed by Leverett (Leverett 1941; Saadatpoor et al. 2010): \(J(S_\text {w}) = \frac{p_\text {c}}{\sigma \cos \theta }\sqrt{\frac{k}{\phi }}\), where \(\sigma\) is the surface tension and \(\theta\) the contact angle. Assuming the same wettability and surface tension for different sand regions, and the same shape of the \(p_\text {c}\) curve, the capillary pressure for any given sand (\(p_\text {cs}\)) can be obtained from the reference curve as \(p_\text {cs}(S_\text {w}) = p_\text {cr}(S_\text {w})\sqrt{\frac{k_\text {r} \phi _\text {s}}{k_\text {s}\phi _\text {r}}}\) (Fig. 3b).

Fig. 3
figure 3

Multiphase flow properties for Model 1. a Capillary pressure measurements and reference curve using a Brooks and Corey (1964) function. b Initial capillary pressure curves, computed from the reference curve using Leverett scaling (see main text). c Relative permeability data (squares and \(S_\text {w}^5\) model) and our fitted Corey model. d, e Relative permeability of gas and water, respectively. The drainage curve is shown as a solid line, while the bounding imbibition curve is shown for sands ESF and G as a discontinuous line. No relative permeability hysteresis was considered for the water phase

Drainage relative permeabilities were obtained from \(\hbox {CO}_2\)-water measurements by DiCarlo et al. (2000), who used water-wet sandpacks with 0.25 mm grain size. Specifically, we used the data reported in their Figs. 4 and 5 and fitted Corey-type functions (Corey 1954; Brooks and Corey 1964) of the form \(k_\text {rw} = (S_\text {w}^*)^a\) and \(k_\text {rg} = c(1-S_\text {w}^*)^b\) (Fig. 3c). The fitted exponents a and b are 4.2 and 1.4, respectively, while c is 0.97. We assumed that the difference in relative permeability of different sands is the result of different irreducible water saturation only (see Fig. 3d,e). For each of our sands, \(S_\text {wc}\) was obtained from Timur (1968) as \(S_\text {wc} = 0.01\times 3.5\frac{\phi ^{1.26}}{k^{0.35}}-1\), where \(\phi\) is in percent and k in mD. This model was used to compute \(S_\text {wc}\) for both the \(p_\text {c}\) and \(k_\text {r}\) curves.

In \(\hbox {CO}_2\) storage, secondary imbibition occurs where the water displaces buoyant gas at the trailing edge of the \(\hbox {CO}_2\) plume, disconnecting part of the \(\hbox {CO}_2\) body into blobs and ganglia and rendering them immobile (Juanes et al. 2006, and references therein). This means that the maximum water saturation that can be achieved during imbibition equals \(1 - S_\text {gt}\) (the trapped gas saturation). Here, we used measurements in sandpacks from Pentland et al. (2010) to determine \(S_\text {gt}\). In particular, we fitted Land (1968)’s model with the form \(S_\text {gt}^* = \frac{S_\text {gi}^*}{1+CS_\text {gi}^*}\), where \(S_\text {g}^* = \frac{S_\text {g}}{1-S_\text {wc}} = 1 - S_\text {w}^*\), \(S_\text {gi}\) is the gas saturation at flow reversal, and C is Land’s trapping coefficient with a value of 5.2 in our fit. Although Pentland et al. (2010) report that the best fit is achieved with the Aissaoui (1983) and Spiteri et al. (2008) models (cf. their Fig. 5), Land’s model was chosen here, given that most relative permeability hysteresis models build on this one (see next paragraph).

Nonwetting phase trapping contributes to irreversibility of the relative permeability and capillary pressure curves (hysteresis). Here, we accounted for this mechanism in the gas relative permeability due to its importance in subsurface \(\hbox {CO}_2\) migration (Juanes et al. 2006, and references therein). In particular, we used Land’s (1968) model to compute the bounding imbibition curve (see Fig. 3d), where \(S_\text {gt}\) is obtained as described above, and Killough’s (1976) model to characterize the scanning curves. In Killough’s model, the scanning curves are reversible, such that the relative permeability at \(S_\text {g} < S_\text {gi}\) no longer depends on the displacement type.

3.2.2 Model 2 (\(M_2\))

This model had access to local, ex situ measurements of single-phase petrophysical properties, i.e., porosity and intrinsic permeability (see Sect. 2.1 and Table 1). Comparing with Table 3, it can be seen that our estimation for Model 1 above was correct to the order of magnitude, but resulted in smaller values: porosity \(\in [85,\, 93]\%\) and permeability \(\in [53,\, 84]\%\) of the local measurements.

Capillary pressures and relative permeabilities were obtained using the same procedure described above for Mdel 1. The slight differences with respect to the curves shown in Fig. 3b, d, e come from the porosity and permeability values used in the Leverett scaling and to determine \(S_\text {wc}\), which were taken from Table 1 instead. The obtained curves for Model 2 are provided in Fig 4.

Fig. 4
figure 4

Multiphase flow properties for Model 2. b Initial capillary pressure curves, computed from the reference curve using Leverett scaling (see main text). b, c Relative permeability of gas and water, respectively. The drainage curve is solid, while the bounding imbibition curve is shown for sands ESF and G as a discontinuous line. No relative permeability hysteresis was considered for the water phase

3.2.3 Model 3 (\(M_3\))

This model was allowed access to all local, ex situ measurements (see Table 1). Initial porosity and permeability remain unchanged with respect to Model 2. Capillary pressure curves were obtained by scaling the reference curve described in Sect. 3.2.1 and shown in Fig. 3a using the measured entry pressure (Sect. 2.1). The scaling followed the model \(p_\text {cs}(S_\text {w}) = p_\text {cr}(S_\text {w})\frac{p_\text {e}}{p_\text {er}}\), where \(p_\text {e}\) is the measured entry pressure for each sand, and \(p_\text {er}\) is the reference curve entry pressure. The obtained curves are shown in Fig. 5a.

Fig. 5
figure 5

Multiphase flow properties for Model 3. b Initial capillary pressure curves, computed according to the entry pressure determined experimentally (see Sect. 2.1). b, c Relative permeability of gas and water, respectively, according to the endpoints determined experimentally (Sect. 2.1). The drainage curves are solid, while the bounding imbibition curves are shown as a discontinuous line. The inset in b is a zoom view around the trapped gas saturation. No relative permeability hysteresis was considered for the water phase

Relative permeabilities were computed following the same procedure described for Model 1 above. In this case, however, each sand type was assigned the measured \(S_\text {wc}\) and \(S_\text {gt}\) values (see Table 1). This led to differences in both the drainage and imbibition curves, as shown in Fig. 5.

3.3 Model Calibration

Concordance between results obtained with each simulation model (M1 to M3) and the validation experiment in Tank 1 (A1, see Sect. 2.4) is quantitatively assessed by comparing the following quantities (see Fig. 6):

  1. 1.

    At \(t = 55\) min (end of injection in port \(I_1\)): Areas occupied by free-phase \(\hbox {CO}_2\), and dyed water with dissolved \(\hbox {CO}_2\) in the bottom F reservoir.

  2. 2.

    At \(t = 154\) min (end of injection in port \(I_2\)): Areas occupied by free-phase \(\hbox {CO}_2\), and dyed water with dissolved \(\hbox {CO}_2\), in the middle and top F reservoirs.

  3. 3.

    Time at which the first finger touches the tank bottom.

  4. 4.

    Time at which the first finger (sinking from the top F reservoir) touches the middle C sand.

Experimental values for points 1–2 were obtained by computing areas from time-lapse images using a vector-graphics software. Careful visual inspection of color-enhanced images was used to distinguish between free-phase \(\hbox {CO}_2\) (white) and dyed water with dissolved \(\hbox {CO}_2\) (yellowish orange to red), and to identify the times for points 3–4 above. Error in experimental values was estimated to be \(\le 5\%\), based on repeated measurements (points 1–2), and \(\sim 5\) min, based on timelapse image comparison (points 3–4). In the simulation models, the threshold gas saturation and \(\hbox {CO}_2\) concentration in water used to compute areas were \(S_\text {g} > 10^{-3}\) and \(C_{\text {CO}_2} > 15\%(C_{\text {CO}_2}^\text {max})\approx 0.2\) [kg/\(\hbox {m}^3\)], respectively. The C value was chosen after a shape comparison of the region with dissolved \(\hbox {CO}_2\). A smaller value of \(C_{\text {CO}_2} > 0.05\) [kg/\(\hbox {m}^3\)] was selected to determine finger times for points 3 and 4 above. Figure 6 shows an overview of the experimental values for points 2 and 3, while Fig. 12 in Sect. 4.2 shows the full comparison with the history-matched/calibrated simulation models.

Fig. 6
figure 6

Front panel view of Tank 1, showing quantities and times for history matching of numerical models to Experiment A1. a shows areas with gaseous \(\hbox {CO}_2\) (free-phase, black contours) and dyed water with dissolved \(\hbox {CO}_2\) (green contours) at the end of injection. Location of injection ports is shown with a star. b shows the time and location where the first finger touches the bottom of the tank (white arrow), as well as the different lithological units. Note the three F reservoirs labeled ‘inf’, ‘mid’ and ‘sup’, mentioned in the text and other figures

The experiment was conducted first. Afterwards, the process consisted of running Simulation models 1 to 3, in parallel, starting with the petrophysical properties described in Sect. 3.2. Given the number of uncertain variables (four petrophysical properties for each lithological unit, the diffusion coefficient and the injection rate) and the time required to complete a single simulation, a manual history matching method was employed. At the end of each run, quantities 1–4 above were compared and one or more properties were manually changed based on observed concordance and domain knowledge. During the first few runs, only quantities 1 and 2 above were compared. After obtaining a satisfactory areal match, petrophysical properties were further adjusted to match quantities 3 and 4.

4 Results

In Sect. 4.1, we present the results of the first simulation of Experiment A1 with each model and property values detailed in Sect. 3.2. Then, we detail the calibration of simulation models using Experiment A1 and assess the value of local data to history-match \(\hbox {CO}_2\) storage simulation models (Sect. 4.2). Finally, we apply these matched models to Experiment A2, analog for a longer injection in the same geology (Sect. 4.3.1), and to Experiment B1, analog for a larger-scale injection in a different geologic setting (Sect. 4.3.2). We use simulations of Experiments A2 and B1 to assess the forecasting ability of simulation models in different conditions.

4.1 Initial Model Results

Figure 7 shows the comparison between Experiment A1 and the first run with each model, at times indicated in Sect. 3.3. Numerous differences are evident between the experiment and Models 1 and 2, while Model 3 is much closer to the experiment. In particular, models 1 and 2 overestimate the extent of \(\hbox {CO}_2\)-rich brine and underestimate the amount of gaseous \(\hbox {CO}_2\) in all F reservoirs (refer to Fig. 6 for location). Model 3 approximates much better the areal extent of gaseous \(\hbox {CO}_2\) in all regions, as well as the \(\hbox {CO}_2\)-rich brine in the middle and upper F reservoirs. Model 2 provides the closest finger migration times (points 3 and 4 in Sect. 3.3), although this was not evaluated in the first run, as discussed below.

Petrophysical properties for Models 1 and 2 were obtained from references in Sect. 3.2, which also used silica sands with similar grain sizes. However, despite the relatively homogeneous nature of our quartz sands, Model 3 is significantly more concordant. This result stems from natural sand variability and highlights the difficulty in establishing general, representative elementary volume-scale properties for porous media (see, for instance, Hommel et al. 2018; Schulz et al. 2019, for a discussion on intrinsic permeability). Additionally, results in Fig. 7 highlight the need for conducting sand/rock-specific measurements, even in the case of well-sorted, homogeneous sediments.

Fig. 7
figure 7

Comparison between Experiment A1 in Tank 1 (left column) and first-run simulation results with Models 1–3. Color map in simulation plots refers to \(\hbox {CO}_2\) concentration in water, according to color bar. The white contours in simulation plots indicate \(S_\text {g} = 10^{-3}\). ad End of injection in port 1. eh End of injection in port 2. il Time at which the first finger touches the tank bottom. mp Time at which the first finger touches the middle C sand

4.2 Manual History Matching and Value of Local Data

Figure 8 shows convergence of areas occupied by free gas (\(A_\text {g}\)) and water with dissolved \(\hbox {CO}_2\) (\(A_\text {d}\)), according to Sect. 3.3. Each iteration corresponds to a successive model with manually updated parameters, and the different F sand regions evaluated in each panel (a) to (f) are provided in Fig. 6. With the exception of \(A_\text {d}\) in the lower F sand, Model 3 is accurate since the beginning, and all areas were satisfactorily matched after four iterations. Conversely, Models 1 and 2 were significantly off the experimental reference during the first few iterations. Model 2, however, was accurate after five iterations, while Model 1 required seven iterations to give satisfactory areal estimates. The mean absolute error (MAE) over the six areal quantities presented in Fig. 8 is evaluated in Fig. 9, where it can be seen that, while all models are accurate towards the end (MAE \(\in [5-10]\,\hbox {cm}^2\)), that required a sixfold improvement in Models 1 and 2, but only twofold in Model 3. As mentioned in Sect. 3.3, \(C_{\text {CO}_2} > 15\%(C_{\text {CO}_2}^\text {max})\approx 0.2\) [kg/\(\hbox {m}^3\)] was used as threshold to determine areas. While the absolute values and error would change with a different \(C_{\text {CO}_2}\) threshold, we checked that the relative accuracy of our calibrated models does not with both \(C_{\text {CO}_2} > 0.01\) and 0.1 [kg/\(\hbox {m}^3\)].

Fig. 8
figure 8

Convergence of areas occupied by free gas (\(A_\text {g}\), left column) and water with dissolved \(\hbox {CO}_2\) (\(A_\text {d}\), right column), during the calibration of Models 1–3 with Experiment A1. \(A_\text {d}\) includes area with gaseous \(\hbox {CO}_2\) (see Fig. 6). Each iteration represents a new simulation run, and the experimental reference (E) is shown as a black line. Refer to Fig. 6 for region location, and to Sect. 3.3 for calibration procedure. a, b: Upper F sand. c, d: Middle F sand. e, f: Lower F sand

Fig. 9
figure 9

Convergence of mean absolute error over the six areal quantities measured during the calibration process. The error is computed with respect to experimental values. See Fig. 8 for areas measured, and refer to Sect. 3.3 for calibration procedure

Agreement between simulations and experimental observations is readily seen in Fig. 10, where the 1:1 line indicates perfect concordance. The degree of concordance can be quantified by means of Lin’s concordance correlation coefficient (CCC) (Lin 1989; Oldenburg 2018), which, for N-valued observation (x) and model (y) vectors (the six areal quantities), is computed as:

$$\begin{aligned} \text {CCC} = \frac{2\sigma _{xy}}{\sigma _x^2 + \sigma _y^2 + ({\overline{x}}-{\overline{y}})^2} \end{aligned}$$
(2)

where \({\overline{x}}\) and \({\overline{y}}\) are the means, \(\sigma _x^2\) and \(\sigma _y^2\) the variances, and \(\sigma _{xy}\) the covariance, all calculated using 1/N normalization. Results in Fig. 10 show that model calibration results in very good concordance for all models (CCC \(\ge\) 0.99).

Fig. 10
figure 10

Concordance between successive model iterations and the experiment, based on six areal measures evaluated during the calibration. Lin’s CCC (Lin 1989) is shown in the key of each subplot, computed according to Eq. 2. a Model 1. b Model 2. c Model 3

Convergence of quantities 3 and 4 in Sect. 3.3, the times at which the first finger touches the rig bottom and the middle C sand, respectively, is provided in Fig. 11. These times were only evaluated after a satisfactory areal match for quantities in Fig. 8 was achieved. Therefore, areas no longer change much in the last few iterations in Fig. 8. In Fig. 11, it can be seen that Models 2 and 3, which incorporated local intrinsic permeability measurements, were significantly closer to our experimental reference than Model 1. Initially, however, we observed that sinking of gravity fingers in the experiment was faster than our model values by a factor of \(\approx 2\). A satisfactory match of all quantities evaluated was achieved after 11, 8, and 7 iterations for Models 1–3, respectively.

Fig. 11
figure 11

Convergence of times at which the first finger touches the bottom of the rig (a) and the middle C sand (b), during the calibration of Models 1–3 with Experiment A1. Refer to Sect. 3.3 for calibration procedure

Overall, we find that Model 3, with access to local single-phase and multiphase flow properties, is closer to the experimental reference (i.e., more concordant) from the start. Model 1 started farthest and required significantly more effort for calibration. After the calibration process, all models achieve very good concordance (CCC \(\ge 0.99\)), based on evaluated quantities (Fig. 10). The calibration shown in Figs. 891011 employs \(D = 10^{-9}\,\hbox {m}^2\)/s in all model versions (\(M_1\) to \(M_3\)). Injection rates (\(I_\text {R}\)) started at 2.0 ml/min for all three models and were 1.6 ml/min, 1.8 ml/min and 1.75 ml/min, respectively, at the end of the calibration. \(I_\text {R}\) is slightly different because the goal was to obtain the best match with each model, considering \(I_\text {R}\) to be an uncertain variable. In Sect. 4.3, the same \(I_\text {R}\) is used to make forecasts with all three models.

Table 4 compares the starting and final (matched) key petrophysical variables for each model. The models were successfully calibrated by adjusting intrinsic permeability and the capillary pressure curves (same shape, but scaled to higher or lower \(p_\text {e}\)) only. It was found that \(\hbox {CO}_2\) migration was most sensitive to the properties of the F sand were most of the \(\hbox {CO}_2\) migration occurs, as well as the ESF seal, which structurally traps the \(\hbox {CO}_2\) plume. In our matched models, \(p_\text {e}\) of ESF is about twice the measured value; this was required because the minimum saturation at which we can define \(p_\text {e}\) and ensure numerical convergence is \(S_\text {g} \approx 10^{-4}\). Reality, however, is closer to a jump in \(p_\text {c}\) from 0 to \(p_\text {e}\) at an infinitesimally small \(S_\text {g}\). Additionally, we found that concordance improved when using different values for the C and F sands in different model regions. In the case of the C sand, the explanation lies in the fault construction process, which may reduce porosity with respect to “natural" sedimentation of stratigraphic layers (Haugen et al. 2023, this issue). The increase in F sand permeability was required to match finger migration times and is possibly compensating the absence of mechanical dispersion in the simulations. This is discussed in Sect. 5. Our calibrated values are within the same order of magnitude of the ex situ measurements (Table 4) and history-matched values for the porous medium in Tank 2 (Landa-Marbán et al. 2023, forthcoming).

Table 4 Petrophysical properties for used quartz sands in Experiment A1

Figure 12 shows gas saturation (\(S_\text {g}\)) and \(\hbox {CO}_2\) concentration (\(C_{\text {CO}_2}\)) maps at times at which quantities 1–4 described in Sect. 3.3 are evaluated. Snapshots are provided for Model 3 only, since all three calibrated models were qualitatively very similar. It can be seen that \(\hbox {CO}_2\) migration is successfully approximated by our numerical model. In detail, however, some differences are apparent: Firstly, sinking of \(\hbox {CO}_2\)-rich water from the bottom injector and horizontal migration along the bottom of the rig is faster in the model. This is due to the higher permeability that our numerical model requires in order to match the gravity fingering advance (cf. Table 4). Secondly, the experiment shows that denser, \(\hbox {CO}_2\)-rich water sinks with a rather compact front and closely spaced, wide fingers. Our model with constant \(D = 10^{-9}\) \(\hbox {m}^2\)/s approximates all gravity-driven migration of the \(\hbox {CO}_2\)-rich water through thinner fingers, with the \(\hbox {CO}_2\)-saturated region receding with \(S_\text {g}\). To better represent fingering widths, we also matched Model 3 with \(D = 3\times 10^{-9}\,\hbox {m}^2\)/s, used in Sect. 4.3.2.

Fig. 12
figure 12

Comparison between Experiment A1 in Tank 1 (left column) and simulation results with Model 3 after calibration (gas saturation shown in middle column, and \(\hbox {CO}_2\) concentration shown in right column). Location of injection ports shown by black stars in d. \(D = 10^{-9}\,\hbox {m}^2\)/s. ac End of injection in lower port. df End of injection in upper port. gi Time at which the first finger touches the rig bottom. jl Time at which the first finger touches the middle C layer

4.3 Transferability: Model Forecasts

A key question after history matching a flow simulation model is whether the physical description has actually been improved, or whether parameters have been modified to match a set of specific observations only. By applying the history-matched models to a different injection protocol (Experiment A2 in Tank 1; refer to Table 2) and subsequently to a different geometry (Experiment B1 in Tank 2), this can be assessed to some extent.

4.3.1 Analog for a Longer \(\hbox {CO}_2\) Injection in the Same Geologic Setting

This case illustrates concordance of our history-matched models in a much longer injection in the same geology (Experiment A2). Before simulating this case, we observed that the trapped gas column against the fault in the experiment was different than what could be achieved with our previous \(p_\text {e}\) for Models 1–3 (Table 4). Because the capillary properties of the C sand in the fault were not directly involved in Experiment A1, we increased \(p_\text {e}\) in our calibrated models for that specific region (\(p_\text {e}\) = 5 mbar against the lower F sand, and 3.5 mbar against the middle F sand). All other parameters were taken from the values calibrated to match Experiment A1.

Evaluation was performed at the end of injection, at \(t = 4\) h 48 min, with a single run with Models 1–3. \(I_\text {R}\) and D were set to the same value in all three models: 1.7 ml/min and \(10^{-9}\) m/\(\hbox {s}^2\), respectively. The experimental result is shown in Fig. 13a, while the simulation with Model 3 is depicted in Fig. 13b, c. We observe that the general distribution of \(\hbox {CO}_2\) is close to the experimental truth. However, the experiment shows a compact sinking front of the \(\hbox {CO}_2\)-rich water without fingers; in our model, gravity fingering is apparent at this stage and fingers are close to the bottom of the rig. Additionally, \(\hbox {CO}_2\)-saturated brine touches the right boundary in the upper F reservoir, which does not occur in the experiment. This is due to capillary breach of the C sand above the middle F reservoir, as shown in Fig. 13b, and can be avoided by reducing the gas saturation value at which \(p_\text {e}\) is defined, or by increasing \(p_\text {e}\).

Fig. 13
figure 13

Comparison between Experiment A2 in Tank 1 (a) and simulation results with Model 3 (b, c) at the end of the injection phase (\(t=4\) h 48 min)

The comparison of areal quantities is provided in Fig. 14 and demonstrates good-to-very-good concordance. Models 2 (MAE = 16 \(\hbox {cm}^2\), CCC = 0.996) and 3 (MAE = 14.54 \(\hbox {cm}^2\), CCC = 0.996) are similarly accurate and slightly better than Model 1 (MAE = 20.18 \(\hbox {cm}^2\), CCC = 0.988), but there are no marked differences.

Fig. 14
figure 14

a: Comparison of areas occupied by free gas (\(A_\text {g}\)) and water with dissolved \(\hbox {CO}_2\) (\(A_\text {d}\)) for Experiment A2 in Tank 1. Experimental reference shown with a star (E). \(A_\text {g}\) (F mid, left) not shown because values are very close to 0. Refer to Fig. 6 or Fig. 13a for region location. b Concordance plot for each of the three models, using the same areal quantities as in a. Lin’s CCC (Lin 1989) is shown in the key, according to Eq. 2

4.3.2 Analog for a Larger-Scale \(\hbox {CO}_2\) Injection in a Different Geologic Setting

Finally, we compare the forecasting ability of our calibrated models against Experiment B1, conducted in a larger-scale, more complex geology (Fig. 1e) (Flemisch et al. 2023, this issue). Similar to Sect. 4.3.1, our goal is to assess the forecasting ability of our calibrated models—without changing their properties. However, given that sand D controls migration in the lower fault (see Fig. 2d) and it was not present in our calibrated models, we allowed one change for Models 1 and 2, which did not have access to local \(p_\text {c}\) measurements. This means that we ran an initial simulation of this experiment with Models 1 and 2 and then adjusted the \(p_\text {c}\) curve of the D sand. The selected curve lies at \(\approx \frac{1}{3}\) of the \(p_\text {c}(S_\text {w})\) shown in Fig. 3 and Fig. 4, respectively.

Next, we evaluate concordance of Models 1–3 by comparing them to the experimental truth after a single run. Evaluation is performed over the total duration of the experiment (120 h), which is simulated with the same \(I_\text {R}\) (10 ml/min) and D (\(10^{-9}\,\hbox {m}^2\)/s) in all three models (\(M_1\), \(M_2\), \(M_{3,1}\)). Additionally, a run with \(D=3\times 10^{-9}\,\hbox {m}^2\)/s was completed with Model 3 (\(M_{3,3}\)) to better approximate finger widths, as noted in Sect. 4.2.

Gas saturation and \(\hbox {CO}_2\) concentration maps at the end of injection with Model 1 are shown in Fig. 15a and Fig. 15b, respectively. The full visual comparison is provided in Fig. 16. We make the following observations:

  • At the end of injection (\(t = 5\) h), all three models forecast some migration of \(\hbox {CO}_2\) into Box B. Models 2 (Fig. 16c) and 3 (Fig. 16d) underestimate the amount of \(\hbox {CO}_2\), while Model 1 (Fig. 16b) overestimates the amount of \(\hbox {CO}_2\) in the top C sand.

  • Also at the end of injection, all models forecast faster sinking of the \(\hbox {CO}_2\)-charged water tongue arising from the lower injector. This is due to the higher F sand permeability required to match finger advance (see Sect. 4.2), particularly in Model 3 with \(D = 3\times 10^{-9}\,\hbox {m}^2\)/s.

  • The speed at which \(\hbox {CO}_2\)-rich fingers sink is slightly faster in our models, compared to the experiment. As expected, Model 3, with a higher diffusion coefficient, displays thicker fingers, with closer widths to the experiment. Similar to our previous observations, the numerical models cannot approximate the compact, \(\hbox {CO}_2\)-rich water front closely trailing the fingers.

  • Dissolution of \(\hbox {CO}_2\) is underestimated by Models 1 and 2, while it is closer, but overestimated, by Model 3.

Fig. 15
figure 15

Comparison between Experiment B1 in Tank 2 (a) and simulation Model 1 (b, c) at the end of injection (\(t = 5\)h). Circles in a denote the location of injection ports

Fig. 16
figure 16

Comparison between Experiment B1 in Tank 2 (leftmost column) and \(\hbox {CO}_2\) concentration maps for simulation Models 1–3 (middle-left, middle-right and rightmost, respectively). \(D = 10^{-9}\,\hbox {m}^2\)/s (Model 1 and 2), \(D = 3\times 10^{-9}\,\hbox {m}^2\)/s (Model 3). The white contours in simulation plots indicate \(S_\text {g} = 10^{-3}\). ad End of injection. eh \(t=15\)h. il \(t=24\)h. mp \(t=48\)h. qt \(t=120\)h

Consistent with our approach described in Sect. 3.3, quantitative analysis is provided by means of areal quantities over time in Fig. 17. Experimental values were obtained via segmentation of timelapse images, and the data were reported on a \(1\times 1\) cm grid where 0 is pure water, 1 is water with dissolved \(\hbox {CO}_2\), and 2 is gaseous \(\hbox {CO}_2\). The segmentation procedure is explained in Nordbotten et al. (2023, this issue). We then obtained the areas of each phase within Box A and B to generate Fig. 17 (refer to Fig. 15a for box location).

In Box A, which contains the main F reservoir and ESF seal, we observe very good concordance (accurate areas) during injection. Afterwards, Model 3 with \(D = 3\times 10^{-9}\,\hbox {m}^2\)/s continues to follow the experiment closely, whereas the others overestimate gaseous \(\hbox {CO}_2\). Note that the PVT properties of our fluids are the same in all models; differences arise due to (1) higher sand F \(S_\text {wc}\) in Model 3, and higher sand F k in Model 2 and especially Model 3 (\(D = 3\times 10^{-9}\,\hbox {m}^2\)/s), compared to Model 1, which allow greater convective mixing (Ennis-King and Paterson 2005) (Table 4); and (2) lower \(p_\text {e}\) and higher k of sand ESF in Model 2 (Table 4), which allows some \(\hbox {CO}_2\) migration into the seal (Fig. 16). In Box B (Fig. 17d–f), Model 1 and Model 3 with \(D = 10^{-9}\) \(\hbox {m}^2\)/s are able to approximately track the experimental truth during injection. However, our models without dispersion cannot capture the areal increase of \(\hbox {CO}_2\)-rich water that occurs afterwards (cf. Fig. 16).

Fig. 17
figure 17

Comparison of areas occupied by each phase during the first 72 h of case B1. Experimental mean (\({\overline{E}}\)) and standard deviation (std) obtained from four experimental runs with identical protocol, while the results for Models 1–3 are for a single run with each matched model. For \(M_3\), two cases are shown: \(D = 10^{-9}\,\hbox {m}^2\)/s (\(M_{3,1}\)) and \(D = 3\times 10^{-9} \hbox {m}^2\)/s (\(M_{3,3}\)). Top row shows areas in Box A, and bottom row shows areas in Box B. a, d Gaseous \(\hbox {CO}_2\). b, e Dissolved \(\hbox {CO}_2\) (includes area with gaseous \(\hbox {CO}_2\)). c, f Pure water

To put these results in perspective, Fig. 18 provides a comparison with results submitted by the international benchmark study (IBS) participants, as well as Experiment B1 (Flemisch et al. 2023, this issue). Figure 18 presents, for each datapoint, mean Wasserstein distances to experiments and forecasts (simulations by IBS participants). Specifically, the Wasserstein metric (W) measures “the minimal effort required to reconfigure the probability mass of one distribution in order to recover the other distribution" (Panaretos and Zemel 2019). We expect \(W \rightarrow 0\) for two samples from the same distribution, given enough values, and two samples to be more similar or concordant the closer W is to 0. To calculate distances shown in Fig. 18, the cell mass density in a \(1\times 1\) cm grid was estimated for all simulations and experiments and then normalized. Therefore, this metric provides a measure of the overall degree of agreement (i.e., in the whole domain). Resulting distances were dimensionalized using the total \(\hbox {CO}_2\) mass in the system, such that the units are grams \(\times\) centimeter, with values \(< 100\,\text {gr}\,\text {cm}\) and \(< 50\,\text {gr}\,\text {cm}\) representing good concordance and very good concordance, respectively. Details and code are provided by Flemisch et al. (2023, this issue). In Fig. 18, it can be seen that \(M_1\)\(M_3\) are comparable to or better than the best forecasts by IBS participants. \(M_1\) and \(M_{3,1}\), in particular, achieved very good concordance.

Fig. 18
figure 18

Wasserstein distances to experiments and forecasts (simulations). Colored circles show forecasts by IBS groups, and results with calibrated Models 1–3 are presented with light gray markers. In each subplot, the vertical axis shows the mean distance between a given datapoint and the forecasts (considering the IBS participants only), while the horizontal axis shows the mean distance between a given datapoint and the experiments. Markers not present fall outside of the axes limits. a 24 h. b 48 h. c 72 h. d 96 h. e 120 h

Further evaluation of simulation model concordance, including comparison with model results before calibration, mass quantities and error measures, is provided in Appendix A. From this analysis (Sect. 4.3 and Appendix A), we find that:

  • All matched models approximate well \(\hbox {CO}_2\) migration and distribution in the domain, seal capacity, and onset of convective mixing. \(M_1\) and \(M_{3,1}\) are most concordant to experiments (Fig. 18).

  • Calibrated models are able to accurately estimate specific quantities during the injection phase, yet they accumulate higher errors later on (Fig. 17 and Appendix B).

  • Similar to Experiment A1, the calibration procedure significantly improved the concordance of \(M_1\) and \(M_2\) with the experiment (Figs. 19 and 16). In Box A, calibration also improved concordance for \(M_3\) (Figs. 20 and 24). Overall, however, matched \(M_{3,1}\) and \(M_{3,3}\) are less concordant than their initial versions, which were already in very good agreement with the experiment (Figs. 21 and 18).

In summary, calibrated models are transferable to a different operational setting or geologic structure, as long as sediments and trap systems remain the same (Experiment A2 and Box A in Experiment B1). Where reservoir connectivity is provided by heterogeneous structures with uncertain properties, accurate deterministic estimates of \(\hbox {CO}_2\) migration are unlikely; models calibrated elsewhere (Experiment A1) were not accurate in our test (Box B in Experiment B1). Given unlimited computational time, the forecasting capability of numerical models calibrated with published data appears similar to those having access to local measurements; the main value of local data lies in reducing the time required for history matching. Obtained results suggest that history matching worsened \(M_3\) forecasts in a different setting (Experiment B1). Therefore, forecasts in a given geologic setting may benefit more from local measurements and accurate physics, rather than history matching, unless historical data of the same setting are available. This is because \(\hbox {CO}_2\)-brine flow is very sensitive to variations in petrophysical properties such as capillary pressure, which will change in different areas, even if the geology is similar.

5 Discussion

In the FluidFlower, strong buoyancy and high permeability lead to persistent appearance and disappearance of fluid phases, as the gas migrates upward and dissolves in the water; coupled with other two-phase flow nonlinearities, these aspects make this problem difficult to solve numerically (e.g., Lie 2019). Comparison between the number of nonlinear iterations and the strength of different physical mechanisms (flow rates, buoyancy, capillarity and dissolution) is presented in Appendix B. A clear correlation can be seen between flow rates and number of iterations. However, buoyancy, capillarity and dissolution all appear to be playing a role, and it is not straightforward to discern which effect dominates; hence, this is a topic that requires further study. We note that difficulties with the convergence of the nonlinear solver have been reported by all participants in the international benchmark study (Flemisch et al. 2023, this issue). As hinted in Sect. 3.1, we addressed this by optimizing linear solver time, reducing the time-step length, increasing the number of time-step cuts and relaxing MRST’s maximum normalized residual where required.

In a 2D isotropic medium and assuming uniform flow, the hydrodynamic dispersion coefficient (\(\underline{{\underline{D}}}_\text {h}\)) can be modeled as \(\underline{{\underline{D}}}_\text {h} = \bigl [ {\begin{matrix}\alpha _\text {L}{\overline{u}} &{} 0\\ 0 &{} \alpha _\text {T}{\overline{u}}\end{matrix}} \bigr ]\), where \(\alpha _\text {L}\) and \(\alpha _\text {T}\) are the longitudinal and transverse dispersivity, respectively, and \({\overline{u}}\) is the average Darcy velocity (Bear 1972). Assuming dispersivities \(\ge 10^{-3}-10^{-2}\) m (Garabedian et al. 1991; Gelhar et al. 1992; Schulze-Makuch 2005) and \({\overline{u}}\approx 3\times 10^{-6}\) m/s (from our simulations), we get \(\underline{{\underline{D}}}_\text {h} \in [3\times 10^{-9}, \, 3\times 10^{-8}]\,\hbox {m}^2\)/s or larger; this means that \(\underline{{\underline{D}}}_\text {h} \ge D\) for the timescales considered (Riaz et al. 2004; Rezk et al. 2022). We also note that numerical dispersivity is on the order of the cell size (\(h \approx 4\) mm in Tank 1, and \(\approx 5\) mm in Tank 2), so it is likely smaller than physical dispersivity. Numerical diffusion can be approximated as uh, which yields maximum values \(\sim O(10^{-7}\, \text {m}^2/\text {s})\) (water phase). However, using the mean of the 75\(^\text {th}\) percentile flow velocity over all time-steps, we obtain \(\sim O(10^{-9}\, \text {m}^2/\text {s})\). Therefore, we estimate that numerical diffusion is lower than physical diffusion almost everywhere in our simulations. Previous work suggested that hydrodynamic dispersion in homogeneous sediments can be accounted for by increasing D (Riaz et al. 2004, 2006), as done here. However, our analysis shows that the spreading of \(\hbox {CO}_2\)-rich water during convective mixing can be loosely, but not accurately, represented by molecular diffusion. Given (1) the dominance of convective mixing on solubility trapping (Ennis-King and Paterson 2005; Neufeld et al. 2010; MacMinn and Juanes 2013; (2) heterogeneity of many natural reservoirs, which increases the importance of dispersion (Riaz et al. 2006; Bear 2018); and (3) the acceleration of \(\hbox {CO}_2\) dissolution due to dispersion, as observed here and by others (e.g., Hidalgo and Carrera 2009), it is important to quantify the balance between diffusion and dispersion to estimate \(\hbox {CO}_2\) trapping.

Our study of \(\hbox {CO}_2\) injection and migration in unconsolidated sands at atmospheric \(p,\, T\) conditions captures the \(\hbox {CO}_2\)-water system dynamics at short to intermediate timescales: buoyancy-driven flow and structural trapping (Bachu et al. 1994; Bryant et al. 2008; Hesse and Woods 2010; Szulczewski et al. 2013), residual trapping (Juanes et al. 2006; Burnside and Naylor 2014) and convective mixing and dissolution trapping (Weir et al. 1996; Ennis-King and Paterson 2005; Riaz et al. 2006; Neufeld et al. 2010; Hidalgo et al. 2012; MacMinn and Juanes 2013; Szulczewski et al. 2013). Due to the very large sand permeability (\(10^2-10^4\) D), convective mixing and dissolution dominate \(\hbox {CO}_2\) trapping. With respect to values at \(\sim 1\) km depth (\(p \sim 100\) bar, \(T \sim 40\) C), the dynamic viscosity and density of \(\hbox {CO}_2\) are \(\approx\) 1/3 and \(3\times 10^{-3}\). Conversely, previous studies with similar setups used analogous fluids with density and viscosity ratios similar to supercritical \(\hbox {CO}_2\)-brine (Trevisan et al. 2017; Krishnamurthy et al. 2022). Dynamics observed in these systems are similar to ours, with vertical migration of \(\hbox {CO}_2\) dominated by buoyancy and lateral spreading of \(\hbox {CO}_2\) plumes with a main tongue at the top of the aquifer or high permeability layer. A quantitative scaling analysis of the FluidFlower (Tank 2) was performed by Kovscek et al. (2023, this issue), who showed that scaling of physical mechanisms to the field scale is possible. Compared to three \(\hbox {CO}_2\) storage projects (Northern Lights, Sleipner and In Salah) the vertical dimension of the storage reservoir is exaggerated 2 to 3 times. Temporally, 1 h in the FluidFlower is equivalent to \(\sim 100-400\) y in the field; thus, the experiment in Tank 2 (120 h) covers well the injection and post-injection periods. Similar to the FluidFlower, Kovscek et al. (2023, this issue) estimate the onset of convective mixing to occur during injection in high-permeability formations like the Utsira Sand (Sleipner). This analysis demonstrates that observations made in the FluidFlower can be used to describe field-scale fluid dynamics and quantify forecasting accuracy.

Our models retained some error at the end of the calibration phase, which is a known problem of manual history matching (Oliver and Chen 2011). Consistent with previous findings (e.g., Fisher and Jolley 2007), results show that Model 2 and 3, which had access to local data, achieved faster match to the experimental truth than Model 1 (Sect. 4.2). However, all models seem to have similar forecasting capability (Sect. 4.3). Subsurface heterogeneity and time constraints may explain why, in practice, it is critical to include local data to achieve history matching, and, especially, concordant forecasting (e.g., Gosselin et al. 2003; Fisher and Jolley 2007; Myers et al. 2007; Kam et al. 2015; Avansi et al. 2016). Calibration with Experiment A1 decreased overall concordance of Model 3 to Experiment B1 (but improved concordance in Box A), compared to forecasts with initial (measured) parameter values. We interpret this to be the result of fluid migration in Experiment A1 being controlled by different units than in Box B in Experiment B1. Therefore, local measurements are paramount, especially if historical data in the trap system of interest are not available.

Additionally, we did not quantify uncertainty in history-matched models due to the availability of a ground truth. In general, however, this is necessary to manage reservoir operations (e.g., Aanonsen et al. 2009; Oliver and Chen 2011; Jagalur-Mohan et al. 2018; Jin et al. 2019; Liu and Durlofsky 2020; Santoso et al. 2021, and references therein). It is also important to note that history-matched models may have grid-size dependencies (see Appendix C), which may require that the grid used to make forecasts, if different or encompassing additional regions, maintain a similar resolution. Finally, multiphase flow in poorly lithified sediments is non-unique (Haugen et al. 2023, this issue), which also contributes to uncertainty. Therefore, it seems prudent to adopt a probabilistic perspective when estimating subsurface \(\hbox {CO}_2\) migration. This is consistent with results in Fig. 18 and Flemisch et al. (2023, this issue): in the highly resolved and geologically simple FluidFlower (compared to the subsurface), forecasts by different simulation groups show large spread.

6 Conclusions

We performed experiments (Sect. 2) and numerical simulations (Sect. 3) of \(\hbox {CO}_2\) migration in poorly lithified, siliciclastic sediments at the meter scale. Three simulation model versions, with access to different levels of local data, were manually history-matched to the experiments (Sects. 4.14.2), and then used to make forecasts (Sect. 4.3). The main findings are:

  1. 1.

    The time required to history match Model 3 (access to both single-phase and multiphase measurements) is lower than Model 2 (access to local single-phase measurements), which is lower than Model 1 (no access to local petrophysical measurements).

  2. 2.

    All simulation models achieve a satisfactory qualitative match throughout the experiments. Quantitatively, forecasting capability of Models 1–3 appears similar: in specific domain regions, models were close to the experimental truth during \(\hbox {CO}_2\) injection, and accumulated larger errors afterwards, especially where heterogeneous structures control \(\hbox {CO}_2\) migration.

  3. 3.

    Overall forecasts with Model 3 after calibration in a similar, but not identical, geologic setting were less accurate than forecasts made with measured values. This emphasizes the importance of local measurements and history matching in the same geologic setting.

  4. 4.

    The addition of a constant molecular diffusion coefficient allows matching convective finger widths to experimental observations. However, simulations without dispersion cannot approximate the compact, \(\hbox {CO}_2\)-rich sinking front closely trailing convective fingers in our experiments.

Simulation models were not always accurate. Given the degree of control in our study, it seems prudent to quantify uncertainty when assessing subsurface \(\hbox {CO}_2\) migration in the field using numerical models. Obtained results suggest that confidence can be increased by obtaining local data, quantifying petrophysical parameter uncertainty, testing sensitivity to petrophysical parameters in different model regions, using historical data from the same setting and including post-injection data when history matching, and incorporating multiple scenarios of \(\hbox {CO}_2\) migration, particularly where heterogeneous structures are at play.