Efficient solution techniques for two-phase flow in heterogeneous porous media using exact Jacobians

Two efficient and scalable numerical solution methods will be compared using exact Jacobians to solve the fully coupled Newton systems arising during fully implicit discretization of the equations for two-phase flow in porous media. These methods use algebraic multigrid (AMG) to solve the linear systems in every Newton step. The algebraic multigrid methods rely on (i) a Schur Complement Reduction (SCR-AMG) and (ii) a Constrained Pressure Residual method (CPR-AMG) to decouple elliptic and hyperbolic contributions. Both methods employ automatic differentiation (AD) to calculate exact Jacobians and are compared with finite difference (FD) approximations of the Jacobian. The superiority of AD is shown by several numerical test cases from the field of CO2 geo-sequestration comprising two- and three-dimensional examples. A weak scaling test on JUQUEEN, a BlueGene/Q supercomputer, demonstrates the efficiency and scalability of both methods. To achieve maximum comparability and reproducibility, the Portable Extensible Toolkit for Scientific Computation (PETSc) is used as framework for the comparison of all solvers.


Introduction
Applications for two-phase flow in porous media are found in numerous scientific fields, e.g., geothermal energy, the remediation of groundwater contamination by non-aqueous phase liquids (NAPLs), hydrogen emission due to barrel corrosion during nuclear waste management, enhanced oil recovery where oil is produced by the injection of water, and production of oil and gas fields as well as geological storage of CO 2 . In geothermal energy, high-enthalpy reservoirs may contain water as liquid and vapor. Successful production The research leading to these results has received funding from the European Union's Horizon2020 Research and Innovation Program under grant agreement No. 640573 (Project DESCRAMBLE) and No. 676629 (Project EoCoE) Henrik Büsing hbuesing@eonerc.rwth-aachen.de 1 Institute for Applied Geophysics and Geothermal Energy, E.ON Energy Research Center, RWTH Aachen University, 52074 Aachen, Germany requires an assessment of the behavior of the phases over time. Remediation of NAPLs during groundwater contamination may involve the injection of hot steam. This reduces the viscosity of the NAPLs and supports their easier transport out of the reservoir. Here, groundwater, injected steam, and the NAPL are the existing phases. During nuclear waste storage, brines may corrode the storage containers, finally resulting in the emission of H 2 . Enhanced oil recovery makes use of water or CO 2 injection to increase the oil yield. In this scenario, water, oil, and possibly CO 2 are the existing phases. In this context, the petroleum industry created the first numerical multiphase flow simulators, described by Douglas Jr. et al. as early as 1959 [17]. Both Aziz and Settari [1] and Chavent and Jaffré [13] summarize in detail the equations for flow and transport in petroleum reservoirs.
In the test examples, the focus is on the geological sequestration of CO 2 . Here, brine-saturated sandstone reservoirs at depths greater than 800 m may provide a safe storage site for supercritical CO 2 . Safety assessments require a prediction of the propagation of the CO 2 plume in the subsurface. Given a safe storage site, the geological sequestration of CO 2 is one option to mitigate anthropogenic effects of greenhouse gas emissions on our climate. Compared with the other applications, this example is particularly challenging because of the high injection rates and the fact that CO 2 is injected but no other fluids are produced. Both aspects result in high reservoir overpressures.
The fully implicit discretization of the fully coupled formulation for two-phase flow equations yields a system of nonlinear algebraic equations. Newton's method is used to linearize this system. This requires solving a system of linear equations in each time step and each Newton step. This linear system is often extremely ill-conditioned and asymmetric and couples strongly the different physical quantities, e.g., pressure and saturation.
Historically, ILU preconditioned methods were used to solve these systems due to their general applicability. Unfortunately, they are neither necessarily efficient nor scalable (cf. [40] Chapter 10.3 on ILU factorization preconditioners). As an alternative, direct methods are known for their robustness and reliability. However, they also require a considerable amount of computation, on the order of O(n 2 ), and memory, on the order of O(n 4/3 ) where n is the number of unknowns (cf. [33]). In contrast, multigrid methods promise linear complexity for certain problems and thus belong to the most efficient class of methods (cf. [24]).
Since the exact solution of the Newton system is not required, iterative methods such as GMRES [41], BiCGStab [46] or FGMRES [39] are used which approximate the solution only to a certain accuracy. The most timeconsuming parts of the numerical simulation are the computation of the Jacobian and the subsequent solution of the resulting linear systems. We focus our attention on these two computational kernels to optimize the solution time.
The linear system contains both hyperbolic and almost elliptic properties. The properties of the matrix are described in, e.g., [6,16,29] as well as [45] and summarized by [10].
Two-stage preconditioners, such as the Constrained Pressure Residual method [47,48], decouple these contributions from each other resulting in preconditioners implicit in pressure and explicit in saturation (IMPES). In a first stage, the pressure equation is solved. The overall solution is then updated with the result from this stage and in a second stage, the total system is solved for the remaining saturation variables. The elliptic subpart can be solved efficiently by AMG, followed by an ILU-based solution of the full system. IMPES schemes may impose severe restrictions on time step size due to the explicit handling of the saturation equation. Jenny et al. [27], propose a sequential fully implicit (SFI) multi-scale finite volume (MSFV) method to avoid these restrictions.
Kayum et al. [28], compare various CPR-AMG strategies together with different decoupling and preconditioning strategies, such as Alternate Block Factorization (ABF), Quasi IMPES (QI) and Dynamic RowSum (DRS). Other approaches making use of AMG are presented by Stüben et al. [45], who discuss strategies for solving fully implicit formulations that possess the elliptic properties required by AMG. Additionally, they present an iterative coupling scheme that is faster and also feasible for AMG as an alternative to fully implicit formulations. Mishev et al. [34], use multiplicative and additive overlapping Schwarz preconditioners together with AMG, while Gries [21] uses system-AMG with DRS preconditioning for an efficient solution of the equations in reservoir simulation. Next to these AMG-based approaches, Klíe et al. [30] present a physics-based two-stage percolation aggregation (2SPA) preconditioner and compare it with classical ILU-based preconditioning.
Recently, methods relying on a Schur Complement Reduction have also been used to solve two-phase flow equations (cf. [10]). We present a variant using AMG on the pressure field as a preconditioner and AMG as a solver on the Schur complement, preconditioned by the saturation block. In addition to AMG, geometric multigrid has also been applied to the fully coupled, fully implicit reservoir equations (cf. [5,6]). However, we focus here on AMG due to its general applicability.
Automatic differentiation (AD) (cf. [37] and [23]) allows for an elegant, exact computation of the Jacobians during the Newton step. We use Tapenade (cf. [25]) to perform the required source-code transformation. Given a function F representing the discretized PDE systems, this source-code transformation generates a function dF, calculating the derivative of the function F. This results from applying the chain rule: every source code can be viewed as a concatenation of basic functions, such as multiplication, addition, exponential, or sine functions. These basic functions have a simple derivative and with the chain rule the complete derivative is easily computed.
The advantage of AD over FD is the exactness of the Jacobian. No additional approximation errors are introduced as would be the case for a FD approximation. In addition, it is not necessary to choose a certain finite difference step size, which should minimize the FD error over the entire computation time. Additionally, the advantage of AD over a hand-coded exact Jacobian is its error robustness. Since the derivative of the source code is generated automatically, no errors can be introduced by manual differentiation.
Although automatic in principle, the derivative generation by AD requires some preparation of the source code, such as specifying the independent and dependent variables, as well as the derivative quantities to be computed. Depending on the code and how well-defined its interfaces are, this may require some effort. Nevertheless, the advantages of AD outweigh these additional preparations by far.
This paper is organized as follows: In the first section, we introduce the governing equations and different parameters. Subsequently, we describe the numerical method and highlight the use of exact Jacobians by automatic differentiation (AD). The last section presents three different test cases: (i) CO 2 injection into a two-dimensional domain for an advection-and a diffusion-dominated test case; (ii) model 2 of the SPE10 benchmark problem, referred to as SPE10B; and (iii) CO 2 injection into the Sleipner reservoir. The conclusion highlights the advantages of AD, as well as the competitiveness of SCR-AMG compared with classical CPR-AMG.
In the Appendix, we show the command line options for PETSc (cf. [2][3][4]) for selecting the different solvers. Additionally, the Buckley-Leverett problem (cf. [9]) is used for code verification. Finally, gravitational and CFL numbers are discussed for the different test examples and the influence of anisotropy is examined.

Mathematical model
The system of partial differential equations (PDEs) that governs two-phase flow in porous media consists of two mass balances, one for the wetting (w), water phase, and one for the non-wetting (n), gas phase. The choice of primary variables is typically a combination of a pressure and a saturation. We use a water pressure (p w ), gas saturation (S n ) formulation. Density ρ α and viscosity μ α , α ∈ {w, n} are either constant or depend on pressure and temperature. Permeability K may be either homogeneous or heterogeneous in space and either isotropic or anisotropic in different directions. Porosity φ may be homogeneous or heterogeneous in space. Flow through the porous medium is governed by the usual extension of Darcy's law [15] for multiphase systems by relative permeabilities k rα , a modification factor for the absolute permeability (cf. [7]). The volumetric flow rates v w and v n are defined as: with k rα denoting relative permeabilities, K the tensor of absolute permeabilities, and g = (0, 0, −g) T gravity. The system of non-linear coupled partial differential equations then reads as such: This system is supplemented by algebraic constraints for the saturations, which sum up to 1: S w + S n = 1, and the relation between wetting and non-wetting pressure by the capillary pressure function: Applying these constraints, inserting the Darcy velocities and choosing water pressure p w and gas saturation S n as primary variables, we have: (3)

Relative permeability and capillary pressure
The two most common approaches for modeling capillary pressure p c were proposed by Brooks and Corey [8] and van Genuchten [20]. The Brooks-Corey model for capillary pressure: is often combined with the approach by Burdine [11] for relative permeabilities: while the van Genuchten model for capillary pressure: is often combined with the relative permeability model after Mualem [35]: Here, S e is the effective saturation: where S wr and S nr are the residual wetting and nonwetting saturations. In the Brooks-Corey model, p d is the displacement pressure and λ the pore size distribution index. The displacement or entry pressure p d is the pressure needed by the non-wetting fluid to displace the wetting fluid from the largest pore. For large and small values, the pore size distribution index λ corresponds to a relatively narrow or wide pore size distribution respectively. Relative permeability-saturation relations after [8] and [20] The van Genuchten model uses parameters τ , which can be seen as an inverse entry pressure, and the parameter m, often chosen as m = 1 − 1 n . Ideally, these parameters are determined by experiments. Figures 1 and 2 show the relative permeability and capillary pressure functions of the two models.

Numerical method using exact Jacobians
We use the implicit Euler method as time discretization and a cell-centered finite volume method as space discretization. Let u α = φ ρ α δ αw + (−1) δ αw S n and v α = −λ α K(∇p w + δ αn ∇p c − ρ α g), where λ α = k rα μ α is the mobility of phase α and δ is the Kronecker delta. The domain is discretized  [8] and [20] into cells V i . For a cell V i , the integral form of Eq. 3 reads as: with q α representing source and sink terms of the (non)wetting phase. Using Green's theorem, we transform the volume integral into a surface integral: Applying the midpoint rule for the volume integrals, a twopoint flux approximation for the boundary integral, and the implicit Euler method for the time derivative, we have: with Δt the time step size of time step n, |V i | the volume of cell V i , |A ij | the area of the surface between cell V i and V j , and d i the distance from the center of cell V i to the interface between cell V i and V j . An appropriate averaging at the interface between cell i and j is needed for quantities with indices ij . The mobilities are fully upwinded: with ψ α = p w +δ αn p c −ρ α g. An arithmetic average ρ α,ij = ρ α,i +ρ α,j 2 is used for the densities and a harmonic average for the absolute permeability: We end up with a system of nonlinear equations: where x = (p w , S n ) T ∈ R 2N is the vector of unknowns, and the vector function consists of the two discretized equations for the water and gas phases. Newton's method is used for linearization of the non-linear system Eq. 14. Thus, in every Newton step, the linear system: is solved, where Δx k equals x k+1 − x k in the k th Newton step. The Jacobian J := ∂F(x) ∂x is of the form: The sparsity pattern of the Jacobian J can be seen in Fig. 3. The 2 × 2 block structure with seven diagonals in each block results from the three-dimensional space discretization on a regular grid with 5 × 5 × 5 cubes and thus J ∈ R 250×250 . The J 12 block has only two diagonals due to the upwind scheme used. The J 22 block has the full seven diagonals because this example has non-zero capillary pressure.
The Jacobian in Eq. 16 is modified in such a way that F 1 is replaced by G 1 = 1 ρ w F 1 + 1 ρ n F 2 in the first N rows. The second N rows are left as is: G 2 = F 2 . Thus, we add Eqs. 1 and 2 of system Eq. 3. In the special case of constant densities, this eliminates the time derivatives and yields a purely elliptic pressure block J 11 . The different entries of the Jacobian have the form: To verify the correctness of the exact derivatives, we compare the Jacobian computed by finite differences (FD), J FD , with the Jacobian computed by AD, J AD . We vary the step size h for FD logarithmically with equidistant steps of 0.5 in the interval [10 −18 , 1]. We compare the relative error of the Jacobian computed with AD, J AD , to the finite difference (FD) Jacobian in the maximum norm for matrices |J AD −J FD | ∞ |J AD | ∞ . Figure 4 shows the relative error for the different step sizes h.
The minimal error is obtained for a step size h = 10 −8 . This is also a very typical choice for the FD step size. Nevertheless, this step size is only optimal for the first Newton step in the first time step. Later iterations may require different optimal step sizes. Using AD, we need not worry about the choice of the step size h, as we always use the exact derivative and not an approximation.

Linear solvers and preconditioners
The linear solver is at the heart of the simulation code. Once the Jacobian is formed, the solution of a linear system is to be performed.
For large systems, direct methods break down due to memory consumption and runtime. Block ILU methods are in principle scalable, but the preconditioner deteriorates with the increasing number of blocks. Using ILU as a preconditioner for the full system also does not scale since the computation of the ILU decomposition is not scalable. As a consequence, we need a solver which is known to scale well, such as multigrid methods for elliptic problems. However, these methods need to be applied carefully since the coupled system is rather a degenerated parabolic/hyperbolic one than an elliptic one.
We compare two different solvers: (1) a Schur Complement Reduction method (SCR-AMG) relying on the preconditioner package Hypre [18,19] and its AMG solver BoomerAMG [26] as preconditioner and solver and (2) a Constrained Pressure Residual method also making use of BoomerAMG for the pressure block and a block Jacobi ILU preconditioner, with no level of fill in (known as ILU(0)), for the saturation block. These two solvers are then combined with either a Jacobian formed by AD or FD.

Schur complement reduction
The method is implemented with the fieldsplit preconditioner from PETSc (cf. [3] Chapter 4.5 on solving block matrices). This preconditioner allows addressing the different fields, in our case pressure and saturation, of a multi-physics simulation with appropriate solvers. On the pressure field, we only apply Hypre/BoomerAMG as a preconditioner and do not use an iterative method. The Schur complement is preconditioned by the J 22 block of the Jacobian and Hypre/BoomerAMG is used as a solver, which terminates after a maximum of 10 iterations or when the residual is decreased by two orders of magnitude.
Given a matrix J = J 11 J 12 J 21 J 22 ∈ R 2N×2N the inverse of this matrix can be written as: with the Schur complement S = J 22 − J 21 J −1 11 J 12 of the block J 11 . Thus, the solution of a 2N × 2N system can be reduced to the solution of two N × N systems. For this solution method to be effective, a good preconditioner for the Schur complement matrix is needed. Note that never any of the matrices is actually inverted but rather a linear system is solved, since we do not need the actual matrix, but rather its action on a vector.
We use PETSc option "a11" and consequently J 22 to construct the preconditioner for the Schur complement. This option is justified since all the derivatives in the Jacobian reduce to the Laplacian in case we assume all parameters to be constant. This yields an effective preconditioner for the Schur complement and consequently an overall very effective solution method. Appendix A.1 describes the different PETSc options to allow for maximum comparability and reproducibility.

Constrained pressure residual
In reservoir simulation, the Constrained Pressure Residual method (CPR-AMG) is often used as a solver for the occurring linear systems [22,31,32,42,45]. The idea is to apply AMG to the elliptic contributions and an ILU method to the hyperbolic part. CPR-AMG is implemented by a composite preconditioner consisting of the fieldsplit method and a block Jacobi ILU method. The fieldsplit method is again used to separate the pressure and the saturation fields. But this time, we use it only to select the pressure field. For this field, we apply Hypre's BoomerAMG as a solver. The saturation field is omitted. Finally, the two fields are combined in an additive way. In general, the additive fieldsplit type solves the J 11 and J 22 block Here, the inverse indicates the solution of a (preconditioned) linear system.
Next, the fieldsplit solution is combined with a block Jacobi ILU method in a composite preconditioner of multiplicative type. Let P 1 and P 2 be the two preconditioners. Then, the effect of the combined preconditioner P on a vector x, y = P x, can be obtained by calculating:  [12] are fixed. SHEMAT-Suite [14,38], a forward and inverse modeling code for the simulation of reactive flow, heat, and species transport in porous media is used as a platform for the implementation of the multiphase flow equations.
All test examples use constant fluid and gas properties. Permeability tensor is isotropic for the two-dimensional test cases and anisotropic for the Sleipner and SPE10B test case. Anisotropy factors reach values of 35 for the Sleipner case and 10,000 for the SPE10B case. CO 2 and brine phase are immiscible. Discontinuous capillary pressure is not considered. The simulations use rectangular grids with variable cell sizes.

Test case 1: Two-dimensional CO 2 injection
Two-dimensional CO 2 injection into a heterogeneous porous medium is examined. The domain of interest extends 600 m in x-direction and 100 m in z-direction with a thickness of 1 m in y-direction. The discretization consists of 2.5 m × 2.5 m blocks, yielding 240 cells in x-direction and 40 cells in z-direction, comprising 9600 cells in total. Thus, this problem has 19,200 degrees of freedom.

Initial and boundary conditions
Initially, there is no CO 2 , i.e., S n = 0. Water pressure p w is hydrostatic with the top of the domain situated at a depth of 800 m. This results in pressures at which CO 2 is in a supercritical state with liquid-like densities. Top and bottom boundaries have noflow conditions, assuming the presence of an impermeable caprock above and below the reservoir. The left boundary is also impermeable and CO 2 is injected through the lower 5 cells, i.e., 12.5 m, with a total injection rate of 0.2 kg s −1 . The right boundary is open and water pressure and gas saturation are held constant at their initial values. Table 3 shows rock properties, as well as the constant fluid properties for the incompressible simulations. The advection-dominated case uses an entry pressure of p d = 0 Pa and the diffusion-dominated case of p d = 10 6 Pa.

Heterogeneous porosity and permeability distribution
To model heterogeneity, we sample the porosity field from the porosity distribution shown in Fig. 5. The corresponding permeability distribution is calculated after [36] with a fractal model valid for a Rotliegend sandstone typical of the northeastern German basin: K = 155 φ + 37 315 φ 2 + 630 (10 φ) 10 .
(26) Fig. 7 Saturation distribution S n after 67 days of injection with 0.2 kg s −1 Figure 6 shows the porosity field and the corresponding permeability field calculated after Eq. 26. Figure 7 shows injection into the heterogeneous porous medium. The permeability distribution clearly influences the shape of the CO 2 plume. Tables 1 and 2 show the number of time steps (TS), number of Newton iterations (NI), and linear iterations (LI), as well as the total time to solution for both test cases, namely the advection-and diffusion-dominated ones. Both CPR-AMG and SCR-AMG profit from the exact Jacobian with fewer Newton iterations and fewer linear iterations. The time to solution is also better using AD compared with FD. For the advection-dominated case, the speedup is 1.17 for SCR-AMG and 1.22 for CPR-AMG when comparing AD and FD. Similarly, for the diffusion-dominated case, the corresponding speedups are 1.56 for SCR-AMG and 1.51 for CPR-AMG.
Comparing the advection-dominated case from Table 1 with the diffusion-dominated case from Table 2, CPR-AMG is faster in the advection case (speedup of 2.35 (AD) and 2.26 (FD)) whereas SCR-AMG is faster in the diffusion case (speedup of 1.51 (AD) and 1.45 (FD)).

Test case 2: SPE10B problem
The 10 th SPE comparative solution project [44] comprises a three-dimensional domain of dimensions 365.76 m × 670.56 m × 51.816 m. This domain is discretized into 60 × 220 × 85 cells, with one cell having a size of 6.096 m × 3.048 m × 0.6096 m. This gives 1,122,000 cells yielding 2,244,000 unknowns. Figure 8 shows the permeability distribution. We inject CO 2 at a rate of 6.62 kg s −1 at the  Figure 9 shows CO 2 saturation distribution after 1060 days of injection. This test case is particularly challenging since it covers a permeability range of more than ten orders of magnitude. Thus, only methods using AD are able to finish the simulation in the allowed compute time of 72 h. Within this period, the methods using the FD Jacobian only manage to simulate 25.3% (SCR-AMG) and 44.9% (CPR-AMG) of the total simulation time. CPR-AMG performs better than SCR-AMG due to zero capillary pressure used in this test example. The speedup for AD is 5.81 comparing CPR-AMG and SCR-AMG. All results are summarized in Table 4.

Test case 3: CO 2 injection into the Sleipner reservoir
This test case simulates injection of CO 2 into the Sleipner gas field [43] operated by Statoil and situated in the Norwegian part of the North Sea. The extracted gas contains high amounts of CO 2 . This CO 2 is not vented into the atmosphere, but rather compressed and reinjected into a permeable sandstone of the Utsira formation, approximately 800 m below the seabed. We use a discretization of 65 × 119 × 50 cells, totaling 386, 750 cells with cell sizes of 50 m × 50 m × 1 m. This yields a domain size of 3250 m × 5950 m × 50 m. Assuming an annual injection of approximately 0.9 Mt, we prescribe an injection rate   Figure 10 shows the permeability distribution, as well as the topology of the sandstone layer. Figure 11 shows saturation distribution in layer 43 after injecting 28 kg s −1 of CO 2 for 17 years.  Table 5 shows the comparison of SCR-AMG and CPR-AMG using AD and FD for the Sleipner case. The speedups are 1.97 for SCR-AMG and 3.12 for CPR-AMG, comparing AD and FD. This test case favors SCR-AMG over CPR-AMG with a speedup of 6.58 (AD) and 10.4 (FD).

Weak scaling
We compare the Schur Complement Reduction method (SCR-AMG) and the Constrained Pressure Residual method (CPR-AMG) on JUQUEEN, a BlueGene/Q supercomputer from IBM, with 28,672 nodes located in Jülich, Germany. Each node consists of an IBM PowerPC A2 running at 1.6 GHz with 16 cores and 16 GB of memory. We start with 16 cores for 393,216 cells and finish with 1024 cores for 25,165,824 cells. We perform five timesteps each executing the same number of Newton iterations. We omit I/O completely for this test case, so neither reading the input file nor the output of computed quantities influences the overall computation time. Figure 12 shows efficiency over the number of cells. Here, efficiency is defined as E = S/N with speedup S = T 1 /T N , N the number of cores, T 1 time to solution      with one core, and T N time to solution with N cores. SCR-AMG as well as CPR-AMG both have efficiencies above 95%, demonstrating nearly ideal scaling properties.

Summary and conclusion
The speedup when using AD instead of FD ranges from 1.17 (SCR-AMG) and 1.22 (CPR-AMG) in the advectiondominated two-dimensional case up to 1.51 (CPR-AMG) and 1.56 (SCR-AMG) in the diffusion-dominated twodimensional case to speedups of 1.97 (SCR-AMG) and 3.12 (CPR-AMG) for the Sleipner reservoir. These differences in speedup are due to the different time percentages for the computation of the Jacobian and time spent in the linear solver.
The SPE10B problem turns out to be the most challenging problem. In this case, only the AD code finishes in the desired simulation time of 72 h. CPR-AMG is in particular suitable for advection-dominated cases (speedup of 5.81 in the SPE10B example) compared with SCR-AMG, whereas SCR-AMG is faster in the diffusion-dominated cases (speedup of 10.40 (FD) and 6.58 (AD) in the Sleipner example).
AD always reduces the overall runtime and reduces the number of Newton iterations and linear iterations. The speedup in the Sleipner case is more than threefold when using AD. Moreover, the SPE10B case could only be solved in time with AD. In addition, AD circumvents the definition of an FD step size and the associated approximation errors and avoids error-prone, hand-coded Jacobians. Consequently, we would always advise to use AD.
AD together with algebraic multigrid (AMG) makes these solution methods for the fully coupled, fully implicit two-phase flow equations highly competitive. Due to its physics-based splitting approach, the pressure and saturation field are addressed in an optimal way.
The two presented solvers are both efficient and scalable with efficiencies above 95% on JUQUEEN. While SCR-AMG is more suited for diffusion-dominated cases, CPR-AMG deals with advection-dominated cases better. Through PETSc, both solvers can be easily modified and enhanced. For example, the Schur complement solution step for SCR-AMG by Hypre/BoomerAMG could be replaced by a block ILU method, accounting for an advection-dominated J 22 block. In contrast, for CPR-AMG, the additive fieldsplit could benefit from an additional solution step on the saturation field for a J 22 block with a strong capillary pressure derivative.
Funding Open Access funding provided by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.

Appendix 1: PETSc command line options for the different solvers
This appendix shows the different command line options used to invoke the linear solvers in PETSc.
The following options for the weak scaling test are shared by all the solvers.

Appendix 2: Code verification using the Buckley-Leverett problem
The Buckley-Leverett problem describes the immiscible displacement of oil by water in a porous medium (cf. [9]). It is widely used for the verification of numerical models. Figure 13 shows the analytical solution for the problem and compares it with numerical solutions with different cell sizes. It is clear that the numerical solution approaches the analytical for increasingly smaller cell sizes.

Appendix 3: Discussion of gravity numbers
The gravitational number is defined as: Gr = (ρ w − ρ n ) g K μ n v cr = gravitational forces viscous forces (27) and relates gravitational and viscous forces. Here, v cr = φl cr t cr is the characteristic velocity with a characteristic length l cr and a characteristic time t cr . Permeability is a scalar, which means that it is assumed to be isotropic. For our test case 1, we have a reservoir thickness of 100 m, thus l cr = 100 m. Simulation time is 67 days and consequently t cr = 5, 788, 800 s. Densities are chosen as ρ n = 450 kg m −3 and ρ w = 1000 kg m −3 . Non-wetting viscosity is μ n = 3.0 × 10 −5 Pa s and permeability is K = 10 −12 m 2 . All in all, we have: Gr = (1000 − 450) · 9.81 · 10 −12 3.0 · 10 −5 · 0.2·100 67·86400 ≈ 52.06.
In addition, we consider two test cases: (i) injection into a hot aquifer with ρ n = 250 kg m −3 and μ n = 3.0×10 −5 Pa s and (ii) injection into a deep aquifer with ρ n = 650 kg m −3 and μ n = 8.0 × 10 −5 Pa s. This leads to gravitational numbers of (i) Gr ≈ 70.99 and (ii) Gr ≈ 12.42. Comparing Table 1 with Table 6, we have a higher number of Newton and linear iterations for the hot aquifer. This is in line with the higher gravitational number of Gr ≈ 70.99 compared with Gr ≈ 52.06 for the original test case. Table 7 shows results for the deep aquifer with a low gravitational number of Gr ≈ 12.42. Here, we have a lower number of Newton and linear iterations. Again, this is in line with the lower gravitational number. Analogous results hold for the diffusion-dominated case.
Gravity numbers for the Sleipner and SPE10B case are Gr ≈ 88.96 and Gr ≈ 555.20, respectively. To calculate a single gravitational number, we used mean permeabilities and porosities. The high gravitational number of the SPE10B case reflects its high degree of difficulty. defined as: Figure 14 shows the CFL number for every time step for the advection-and diffusion-dominated two-dimensional test case for the original, hot, and deep aquifer. The figure shows results from the finite difference approximation of the Jacobian and SCR-AMG.
For the advection-dominated case, we have CFL numbers of over 100 in the 12th time step. For the hot aquifer, CFL numbers reach values of approximately 125 also in the 12th time step and for the deep aquifer values of over 50 in the 13th time step. Similarly, for the diffusion-dominated case, we have CFL numbers of over 140 in the 13th time step. For the hot aquifer, CFL numbers reach a value of over 105 in the 15th time step and for the deep reservoir over 110 in the 13th time step. The high CFL numbers indicate that the fully implicit fully coupled approach is indeed well justified.

Appendix 5: Influence of anisotropy
In this section, the influence of anisotropy on solver performance is examined. The diffusion-dominated twodimensional test case from Section 4 is the basis for the comparison. Results for anisotropy factors of 1:100 and 1:10,000 are shown in Tables 8 and 9, respectively. Comparing Table 8 with Table 2, performance of CPR-AMG decreases. An increase in the number of time steps (factor of 4.21 for FD and 4.69 for AD), Newton iterations, and linear iterations is observed. The performance of SCR-AMG is affected less by anisotropy with a slight increase in the number of time steps (factor of 1.41 for FD and 1.50 for AD), Newton iterations, and linear iterations.
Looking at Table 9, performance of CPR-AMG drastically decreases, both for the FD and the AD case. The number of time steps increases by a factor of 31.90 (FD) and 34.19 (AD) compared with the original values with no anisotropy. The performance of SCR-AMG is significantly more stable. The number of time steps increases only by a factor of 1.59 (FD) and 1.77 (AD) compared with the values with no anisotropy.
In summary, CPR-AMG is affected much more by anisotropy compared with SCR-AMG. Interestingly, this is also true for the advection-dominated case, where CPR-AMG originally outperformed SCR-AMG.