A numerical study of the additive Schwarz preconditioned exact Newton method (ASPEN) as a nonlinear preconditioner for immiscible and compositional porous media flow

Domain decomposition methods are widely used as preconditioners for Krylov subspace linear solvers. In the simulation of porous media flow there has recently been a growing interest in nonlinear preconditioning methods for Newton’s method. In this work, we perform a numerical study of a spatial additive Schwarz preconditioned exact Newton (ASPEN) method as a nonlinear preconditioner for Newton’s method applied to both fully implicit or sequential implicit schemes for simulating immiscible and compositional multiphase flow. We first review the ASPEN method and discuss how the resulting linearized global equations can be recast so that one can use standard preconditioners developed for the underlying model equations. We observe that the local fully implicit or sequential implicit updates efficiently handle the local nonlinearities, whereas long-range interactions are resolved by the global ASPEN update. The combination of the two updates leads to a very competitive algorithm. We illustrate the behavior of the algorithm for conceptual one and two-dimensional cases, as well as realistic three dimensional models. A complexity analysis demonstrates that Newton’s method with a fully implicit scheme preconditioned by ASPEN is a very robust and scalable alternative to the well-established Newton’s method for fully implicit schemes.


Introduction
Simulation of multiphase and multicomponent flow has many applications within the subsurface sciences, including oil and gas recovery, carbon dioxide sequestration, groundwater and subsurface remediation, and geothermal energy management. The basic flow models consist of conservation TotalEnergies, Pau, France equations for each species where the fluid velocity is given by Darcy's law, which together form a nonlinear system of parabolic partial differential equations. This system will usually have a mixed elliptic-hyperbolic sub-character and generally describe delicate balances among capillary, gravity, and viscous forces that vary over time and throughout space. If you combine this with orders of magnitude multiscale variations in rock properties, grids with skew geometries and large aspect ratios, nonsmooth and hysteretic rock-fluid properties, and large variations in temporal and spatial scales, you end up with numerical models that are surprisingly challenging to simulate efficiently.
The standard approach in reservoir simulation (and in simulation of carbon storage) is to use a fully implicit discretization and iteratively solve for all the primary unknowns at once using Newton's method. This requires repeated solves of large, ill-conditioned linearized systems of equations, for which domain-decomposition methods are well established as a means to accelerate the solution process [18]. These techniques can be categorized into methods for variable decomposition and spatial decomposition. State-of-the-art simulators use so-called constrained pressure residual (CPR) preconditioners [5,24,48,49], which fall into the first category and are designed to exploit the mixed elliptic-hyperbolic character of the system by first solving for a pressure-like variable by an efficient algebraic multigrid (AMG) preconditioner [10,12,45,46], followed by a broadband smoother like incomplete lowerupper factorization with zero fill-in (ILU0) applied to the whole system. Standard domain additive and multiplicative Schwarz decomposition methods have been used for decades as a parallelization strategy [6,16,18,24]. In addition, recent multiscale methods (see [27] for an overview) can be interpreted as domain decomposition methods [40]).
Domain decomposition is also very appealing as a nonlinear solution strategy in reservoir simulation: The system of nonlinear equations tends to be strongly coupled, highly nonlinear, and unbalanced (i.e., a safe step length for Newton's method is determined by a small subset of the full variable set [3]), so that unless we take very short timesteps, the initial guess is typically far from the solution. A particularly popular nonlinear variable-decomposition approach is sequential splitting of the full problem into a pressure subproblem and a set of transport subproblems [33,34,38,47,50], which also facilitates using efficient multiscale methods for the pressure subproblem [13,27,36].
Nonlinear spatial decomposition methods, on the other hand, have so far seen limited use. A main reason for this could be that even though nonlinear spatial domain decomposition is excellent at handling unbalanced nonlinearities, it tends to converge slowly for the full residual. Cai and Keyes [3] therefore proposed to use domain decomposition as a nonlinear preconditioner and devised an additive Schwarz preconditioned inexact Newton (ASPIN) method. The method has later been applied to a range of flow problems, including single-phase Navier-Stokes cavity flow [4], two-phase porous media flow [43], and single-phase Forchheimer flow with a restricted, exact Newton version (RASPEN) [9]. Skogestad et al. [43] also demonstrated that the performance of ASPIN is better than that of the linear preconditioning counterpart. Moreover, Liu et al. [30] applied ASPIN with variable decomposition to sequential splitting for two-phase porous media flow, and later also devised a multiplicative (MSPIN) method [29,31]. Other recent work on variable-decomposition MSPIN for subsurface flow includes geothermal applications [51] and simulations with multisegment wells [25].
Herein, we study preconditioning of the fully implicit method with the additive Schwarz preconditioned (in)exact Newton (ASPEN/ASPIN) method on a variety of flow problems in reservoir simulation, including both compressible black-oil type and compositional models. In the following, we will for simplicity refer to both methods as ASPEN. These preconditioners can be seen as a two-step algorithm.
During the first step, we converge all the local subdomains independently with an additive Schwarz algorithm. For each subdomain, we freeze the values outside of the subdomain and solve the fully implicit equations with Newton's method locally. When all variables have been updated, the residual of the equations is zero inside each domain, whereas nonzero values can be observed at the boundaries between the subdomains. The second step of the ASPEN algorithm is to minimize a global function with Newton's method. We show that the local fully implicit solves accurately account for all local couplings, that the global ASPEN solves resolve the global interactions well, and that the combination of these two updates leads to a very robust and scalable alternative to Newton's method when applied to fully implicit schemes.
The paper is organized as follows: Section 2 describes the model, Section 3 reviews the ASPEN framework, and in Section 4 we report and discuss the results of a series of numerical experiments.

Governing equations
We consider a generic set of flow equations for a multiphase, multicomponent system that has been discretized in time by a finite-difference method and in space by a finite-volume scheme Here, M i is the vector containing the conserved quantity of component i in all the grid cells, V i is the vector of flow rates for component i across each cell interface, and Q i is the vector of source terms. Superscript n refers to the timestep and div is a discrete analogue of the standard divergence operator defined over a volumetric grid that maps face values to cells; see, e.g., [26, §4.4] for details on how this operator is defined. We do not make any special assumptions on the grid, except for assuming that it consists of a finite collection of polytopal cells with matching faces. We use a standard Peaceman-type model to relate the effective source terms Q i in each perforated cell to the difference between the wellbore pressure and average cell pressure. This involves solving one extra conservation equation for each of the well phase fluxes. Well controls on bottom-hole pressure or fluid rates are (weakly) enforced by an additional control equation. The detailed formulation used herein is described in detail in [26, §4.3.2 and 12.2.4].
By collecting the individual equations in Eq. 1, we obtain a nonlinear system of discrete residual equations, R(u) = 0. This system is typically solved using Newton's method: We assume that for a value u there exists some update Δu so that u + Δu solves the residual equation and linearize around u, Neglecting higher-order terms and solving for Δu gives us the iterative Newton's method in its basic form In most simulators, the Newton increment Δu is not applied directly but is modified to ensure that the updated state is physically meaningful, and at the same time aiming at stable convergence. One method to this end is to let Δu define the search direction for an inexact line-search method or for more sophisticated trust-region approaches [14,21,35]. Likewise, we typically need to precondition the linear system made up of the Jacobian and residual in order to solve it efficiently. To keep the presentation as simple as possible, we disregard these issues in the following as the specific choices are not essential to the ASPEN solver. Before continuing to discuss the ASPEN framework, we briefly outline some typical examples of Eq. 1.

Immiscible flow
In subsurface flow, we typically consider three phases: aqueous (a), liquid ( ), and vapor (v). For immiscible flow, each phase is assumed to consist of one component only and Eq. 1 thus describes conservation of mass for each phase.
The system of residual equations is formulated using where α = a, , v. Here, φ is porosity, ρ α denotes densities, S α is saturation, v α is phase flow rate, and q α represents volumetric sources of phase α. To close the system, we need relationships for the flow rates v α , saturations S α , relative permeabilities k rα , and phase pressure Here, K is the absolute permeability tensor, grad is a discrete gradient, μ α is viscosity, ρ α is density, g is the gravity acceleration, z is the vertical coordinate, and p α c are capillary-pressure curves.

Compositional flow
In compositional models, we use mass fractions x α,i to account for the amount of component i contained in phase α. In this case, Eq. 1 represents mass balances for each component, so that Note that the closure relationships (5) are expanded by the requirement that the mass fractions must sum to unity for each phase α. In addition, the partition of the components between the phases is determined by an equation-of-state; see, e.g., [41,42,44]. We assume a simple water treatment where the aqueous phase is synonymous with the water component and the liquid and vapor phases can contain any combination of the non-water components.

Sequential splitting
A fully implicit method solves the full nonlinear system of residual (1), with Eqs. 4-5 for immiscible flow and Eqs. 4-6 for compositional flow, for all unknowns simultaneously in each timestep. In a sequential implicit method, we perform a variable decomposition of u into flow variables u p (pressure) and transport variables u t (saturations and/or mass fractions) and then use this to formulate a flow equation and a (system of) transport equation(s). These are then solved implicitly in consecutive steps. The flow equation is derived as a weighted sum of the component conservation (1), The weights ω i are chosen so that all partial derivatives of the resulting equation with respect to transport variables disappear; i.e., i ∂ v (ω i R n+1 i ) = 0 for all v ∈ u t . The transport equation, R n+1 t = 0, consists of the original component conservation equations, but with fixed pressure variables and the flux (V α or V i ) formulated in terms of the total velocity; see [38] for details.

ASPEN framework
We want to find the solution u to the system of nonlinear residual equations R(u) = 0, where u and R(u) are vectors of real numbers. In the following, R will be one of three possible types of equations: (i) the full system of residual component (1), which has a mixed elliptichyperbolic character; (ii) the flow equation with R = R n+1 p , which is parabolic but with an elliptic character; and (iii) the transport equation with R = R n+1 t , which is either hyperbolic or parabolic with a strong hyperbolic character. The additive Schwarz method for solving such problems is generally not very robust by itself, but may be a very efficient nonlinear preconditioner.

Brief review of the method
To explain the method, we first partition the full problem into two subproblems. Without loss of generality, we assume that variables and equations are ordered so that this can be written on the form were u = (u 1 , u 2 ) is a partition of the unknowns into two non-overlapping subdomains with corresponding residual equations R 1 and R 2 that have their accumulation terms defined in the same non-overlapping partitions. In an additive Schwarz method, these are solved independently while keeping the unknowns in the other domain fixed. Formally, we introduce the following solution operator where L i (u) is defined as the solution to subproblem i, Solving R(u) = 0 is then equivalent to finding the fixed point u such that u = L (u). This fixed-point iteration scheme tends to have poor convergence properties, but can be accelerated by means of nonlinear acceleration techniques; Jiang and Tchelepi [15] compares some of these techniques in the context of subsurface flow. One popular class of acceleration techniques are quasi-Newton methods, which reformulate the fixed-point iteration as a residual equation and use information from previous iterates to construct an approximate Jacobian. ASPEN uses a similar approach, but computes the exact Jacobian instead of an approximation. We write the fixed-point equation on residual form, and derive a Newton method equivalent to Eq. 3, The question is now how to compute the Jacobian ∂F/∂u. The derivatives of the implicitly defined solution operator L do not have a simple closed form in the general case and cannot be directly computed by means of conventional techniques such as automatic differentiation. However, by taking the derivative of the first domain equation R 1 (u) = 0 with respect to u, we find that Note that this is evaluated in (u 1 , u 2 ) = (L 1 (u), u 2 ). We rearrange to obtain and similarly, Here, Eq. 14 is evaluated in (u 1 , Altogether, this means that we are able to compute the Jacobian of the solution operator L , and thereby the Jacobian of F, using only known quantities. Moreover, part of the computation is already done, because solving the subproblems (10) necessarily requires that we compute ∂R i /∂u i (provided that we use Newton's method). The coupling terms ∂R i /∂u j must be computed after the subdomain solves. Note also that ∂u i /∂u applied to a vector v simply acts as a restriction onto subdomain i, The method naturally extends to the case where we have m subdomains, each defined by their corresponding nonoverlapping subset. For this, we define a non-overlapping partition of the full problem with corresponding solution operators L 1 , . . . , L m such that The expression for the Jacobian of each solution operator is analogous to the two-subdomain case and is found in the same manner: Wells often constitute the parts of the model equations that have the strongest coupling and thus represent a primary hindrance to nonlinear convergence. Equilibration within the wellbore is usually assumed to be (almost) instant. In our experience, it is thus important that all the variables of a single well, or all well variables that are subject to the same group or field control, belong to the same subdomain. Grouping variables from wells that are far apart into a single subdomain does not pose any conceptual difficulties for ASPEN, because the method does not require spatially contiguous subdomains.
The near-well region usually contains a relatively large pressure drop and tends to exhibit strongly coupled and nonlinear behavior. We therefore group all well variables and the reservoir variables from a region surrounding each well into a single subdomain. By default, the surrounding region is set to be the perforated cells plus a padding layer, defined so that each perforated cell has at least one topological neighbor between itself and the nearest subdomain boundary, but in some examples we use somewhat larger domains. Using at least one level of cell padding also means that the formulation discussed in the previous section does not need any modification to account for extra coupling terms that potentially could have been introduced by wells, because the residual well equations only have nonzero derivatives with respect to reservoir variables from the perforated cells. Subdomains must be merged if they overlap in space or if the wells are subject to a common control, which is not the case for any of the examples considered herein.

Some properties of the linearized system
In contrast to the Jacobian ∂R/∂u of the original problem, the Jacobian ∂F/∂u is generally dense, and efficient preconditioners are therefore challenging to construct. However, a breakdown of the blocks in Eq. 14 reveals that and similarly for the second subdomain. The first block, ∂L 1 /∂u 1 , is zero because L 1 gives the solution in subdomain one regardless of the initial guess u 1 . However, this solution will obviously depend on the boundary conditions imposed from the constant values u 2 in subdomain two, and consequently, ∂L 1 /∂u 2 is generally nonzero. Looking at Eq. 12, it follows that the diagonal blocks of ∂F/∂u are simply the identity, and In this linear system, ∂R 1 /∂u 1 and the rows of ∂R/∂u corresponding to the residual equation in subdomain one are evaluated in u = (L 1 (u), u 2 ), whereas ∂R 2 /∂u 2 and the rows of ∂R/∂u corresponding to subdomain two are evaluated in u = (u 1 , L 2 (u)). This formula enables us to interpret the linearized system as This is just the usual linearization (3) Figure 1 illustrates the entire simulation workflow, from an initial state until the final time horizon is reached. The figure highlights important aspects of an ASPEN implementation and therefore merits some discussion. Because all subdomains are solved with fixed quantities from the previous nonlinear iteration, they can be solved concurrently. In this sense, the local stage is perfectly parallel. Note that we also check for convergence after the local stage. This can constitute significant computational savings, because it opens up for circumventing the global stage altogether. If the global residual is not converged after the local stage, we proceed to the global stage. This stage obviously requires some communication but can nonetheless be implemented very efficiently by utilizing the fact that all Jacobian blocks ∂R i /∂u j for subdomain i will be contained on the same worker (or processor). During the global stage, each worker will in effect have access to an entire row of Jacobian blocks. To compute the right-hand side, we therefore only need to communicate F(u) to all the workers. Moreover, if we use a Krylov-type iterative solver like GMRES, we only need to compute matrix-vector products on the form −(∂R/∂u)v. This can be efficiently implemented by communicating the linear iterate to all workers, multiplying it by its local row of Jacobian blocks, and gathering the results.

Complexity analysis
Assume N cells and an average of n = N/m cells in each subdomain. Assume that one row in the Jacobian has d nonzero entries. The dominant part of a nonlinear iteration is the linear solve, which has complexity (Nd) γ , where γ may vary from 1.2 for modern iterative linear solvers to 3 for direct Gaussian elimination. For spatially contiguous subdomains, including the well subdomains considered herein, the only off-diagonal elements neglected in the subdomain Jacobians correspond to cells outside the subdomain. To be on the conservative side, we therefore say that also the subdomain Jacobians have d entries per row. Hence, a linear solve will have complexity (nd) γ . The outer iteration of ASPEN will be of the same complexity as a standard Newton iteration, e.g., be dominated by the linear solver with complexity (Nd) γ . Assuming k is an upper bound for the number of nonlinear iterations used to solve one subdomain, the cost of one ASPEN iteration relative to the cost of one regular Newton iteration can be written as when all the local subdomains are solved concurrently, and when the local subdomains are solved serially. If the global residual has converged after the local stage (c.f. Fig. 1), we do not solve the global system and thus only count the added cost of the local solves (i.e., the last term). This analysis does not account for the cost of assembly, which scales linearly with the number of cells and primary variables, and that we for sufficiently small subdomains use a direct linear solver, which is generally more robust than iterative solvers.

Numerical examples
The ASPEN method is implemented using the automatic differentiation (AD-OO) simulator framework of MRST [26]. In the following, we report the performance of ASPEN on a number of examples using both fully implicit and sequential solution strategies and compare its efficiency with results from a nonlinear Newton-Raphson solver with damping strategies used in commercial simulators, but without preconditioning. We refer to the former method as ASPEN and the latter as Newton. The examples include both conceptual setups constructed to challenge the nonlinear solver and realistic setups with industrygrade geology and fluid properties. Applying ASPEN in combination with a sequential solver means herein that we solve both the pressure and transport subproblems with ASPEN, using the same spatial domain decomposition. For simplicity, we do not include outer iterations, and the sequential results are thus not guaranteed to converge to the fully implicit solution.

Example 1: Buckley-Leverett displacement
We consider water injected at a constant rate into a horizontal, one-dimensional, oil-filled reservoir with fluids produced at a constant pressure at the other end. Relative permeabilities are quadratic for both phases, whereas all other rock and fluid properties are set to unity. We subdivide the 100-cell grid into five subdomains of 20 cells each and simulate the problem with fully implicit and sequentially implicit solution strategies. The top part of Fig. 2 reports the saturation after the local and global stages of an ASPEN iteration for a sequentiall simulation with CFL number equal 5. After the first local solve, the intermediate solution clearly exhibits kinks across the subdomain boundaries. The global iteration effectively levels out these kinks, so that after a full ASPEN iteration, the transport problem has almost converged. We also see that the subdomain iterations are localized to the propagating wave (shock followed by rarefaction wave). For the pressure update, the first set of subdomain solves has limited effect due to the elliptic nature of the pressure equation, but the global step effectively converges the pressure. Figure 3 compares the average number of nonlinear iterations consumed per timestep for ASPEN and standard Newton for fully implicit and sequential solution strategies. Because ASPEN uses both subdomain and global iterations, we use C c from Eq. 22, averaged over all timesteps, to define a number that represents a comparable computational cost to the Newton iterations. This is done for all examples in the following. We see that the ASPEN solver successfully reduces the effective number of nonlinear iterations for the fully implicit and transport problems compared with Newton. The pressure subproblem is linear for this example It is also of interest to see the spread in the number of nonlinear subdomain iterations for ASPEN. Figure 4 reports a normalized histogram of the number of local iterations used per subdomain throughout the whole simulation. Because each ASPEN iteration involves converging every subdomain over an entire timestep, the maximum number of subdomain iterations per timestep (i.e., k in Eq. 22) will almost always be larger than the number of outer ASPEN iterations. However, keep in mind that a subdomain iteration is a lot less expensive compared to a global iteration due to the reduced size of the subdomains and the corresponding linearized systems.

Example 2: Fractured reservoir
Our next test is a slightly modified variant of an example from [38]. We consider a 1000 × 500 m 2 reservoir crosssection containing thirteen high-permeability fracture channels. The background sand consists of five bands in the eastwest direction. Within each band, the permeability is drawn from a lognormal distribution with a distinct mean. We discretize the domain using conforming Voronoi cells generated by the upr module in MRST [1], and use a volumetric representation for the fracture channels (see Fig. 5).
We assume a two-phase, liquid-gas model with ndecane, carbon dioxide, and methane, with phase behavior defined by the cubic Peng-Robinson equation-of-state [41]. The reservoir is initially filled with a mixture of the three components. A well in the southwest corner injects a mixture of n-decane and carbon dioxide, whereas the producer in the northeast corner operates at a bottom-hole pressure of 50 bar. We simulate 2 555 days of injection with timesteps that gradually increase to 20 days, as a reasonable compromise between too high CFL numbers in the fractures and too low CFL numbers in the matrix. Rapid fingering of injected fluids through the fracture network, combined with initial reservoir pressure just below bubblepoint nevertheless makes this a very challenging test case for the nonlinear solver. Figure 6 reports the mole fraction of carbon dioxide at four selected timesteps along with the number of nonlinear iterations required to solve these timesteps with Newton and ASPEN. Before the injected carbon dioxide reaches the producer after approximately 900 days, both Newton and ASPEN converge steadily, using on average 3.7 and 2.0 nonlinear iterations per timestep, respectively. From 900 days and until the end of the simulation, Newton struggles significantly, and as a result, cuts the timestep in half multiple times, giving a large number of wasted iterations. ASPEN, on the other hand, continues to converge steadily. This is evident from the two last reported timesteps, for which Newton uses 62 and 31 nonlinear iterations to converge, whereas ASPEN converges in 2.2 and 2.1 scaled iterations, respectively. Averaged over all timesteps, Newton consequently uses 18.6 nonlinear iterations, whereas ASPEN requires only 2.2.
To investigate how the number of subdomains affects the nonlinear iteration count, we also run the same setup with 3 × 3, 7 × 7, and 10 × 10 partitions. Figure 7 reports the average number of nonlinear iterations per timestep. To give an idea of how computational effort shifts from the local to the global stage, the bar plot shows both the observed number of global ASPEN iterations and iterations scaled according to Eq. 22. As expected,  the two seem to converge to a constant difference, but from opposite sides: Unscaled iterations increase slightly with the number of subdomains because fewer subdomains means fewer kinks in the solution after the local stage (c.f. Fig. 2), and consequently less global iterations to resolve long-range interactions (think of the trivial case with a single subdomain, for which the solution converges in one global iteration). Scaled iterations decrease slightly with the number of subdomains because the theoretical cost of solving each subdomain tends to zero with the relative subdomain size. (In practice, each solve also involves a startup cost, which is not accounted for in our simplified analysis). The scaled iterations converge to a number slightly smaller than the unscaled ones because some of the ASPEN iterations converged after the local stage, so that we have only counted the cost of the local stage for these.
The normalized histograms of subdomain iterations reported in Fig. 8 show that approximately 90% of all subdomain solves required five, four, three, and one iteration or less to converge for the 3 × 3, 5 × 5, 7 × 7, and 10 × 10 partitions, respectively. A few subdomains require up to 15 iterations, but because these solves are localized, the cost is significantly less than for Newton, for which lack of convergence in a single cell will cause the solver to continue iterating over the whole domain. Note also that more than 10% of the subdomain solves do not require any iterations to converge for the 10 × 10 partition.

Example 3: Gravity segregation with horizontal barriers
The example is inspired by a test case from [11]. A vertical 100 × 100 m 2 reservoir cross-section with homogeneous permeability of 1 mD and porosity of 0.3 is initially filled with an immiscible three-phase fluid. A heavy aqueous phase (density 1 500 kg/m 3 , viscosity 1 cP) occupies the top 10% whereas the bottom 10% is occupied by a light gaseous phase (density 500 kg/m 3 , viscosity 1 cP). An oleic phase with density 1000 kg/m 3 and viscosity 2 cP fills the remaining 80% of the domain. We use quadratic Brooks-Corey relative permeabilities for all phases [2]. Density differences will cause the aqueous phase to gradually exchange places with the light gaseous phase. The reservoir has several horizontal, partially sealing barriers that will deviate flow and give rise to a complex, gravity-driven flow pattern with fast-flow regions around barrier openings and stagnant regions close to barrier centers. Figure 9 shows the setup, with barriers and initial saturation.
To minimize the coupling among subdomains, we partition the domain so that regions with fast vertical flow lie close to the center of each subdomain. To this end, we first solve an incompressible pressure-drop problem from bottom to top and use the resulting interface fluxes to identify fast-flow regions. We use the center of each fast-flow region, together with points along the boundary, to construct a Delaunay triangulation and its dual Voronoi diagram. The partition is then defined by matching cell centroids in the grid to encompassing Voronoi cells. Finally, we split subdomains that are divided by a sealing barrier and merge the smaller part into a neighboring subdomain. The left part of Fig. 9 shows the subdomain partition. Figure 10 reports the saturation at four selected timesteps along with the corresponding total number of subdomain iterations for the fully implicit ASPEN solver. As expected, more subdomain iterations are required in regions with large changes in the flow, but localizing iterations also means that some subdomains are already converged, so that no iterations are needed. Figure 11 compares the average number of nonlinear iterations for the fully implicit and sequential implicit strategies with Newton and ASPEN. The transport subproblem in this example is particularly challenging for the nonlinear solver and is also where ASPEN gives the largest iteration reduction. Because the nonlinearities are resolved locally, the global ASPEN update will be very accurate.
The pressure subproblem only benefits slightly from the domain decomposition. We believe the main reason is that the pressure remains transient throughout large parts of the simulation as a result of changes induced by the upward and downward movement of phases with different density and subsequent change in gravity head. The main argument for using ASPEN for the pressure problem would thus be better parallel scalability.
Compared with the previous example, we see that a significantly larger fraction of the subdomain solves require more than seven iterations both in the fully implicit and the sequential algorithm (see Fig. 12). The distribution is nonetheless skewed toward the lower end of the scale: 6-7% of the subdomain solves do not require any iterations at all and the approximate 90 percentile is found at eight iterations for both solvers.

Example 4: Field-scale model (SAIGUP)
Moving on from conceptual examples, we next consider a field-scale model with realistic geology (Fig. 13). The geomodel is a realization of a 9 × 3 km 2 shallow-marine oil field from the SAIGUP study [32], modelled as a 40 × 120 × 20 corner-point grid with 78,720 active cells. Permeabilities and porosities are drawn from multimodal distributions to model mud drapes and fast-flow regions.
The reservoir also has multiple faults delineating different flow compartments. We consider a simple two-phase oil/water model, with viscosity of 1 and 5 cP and density of 1014 and 859 kg/m 3 for the aqueous and oleic phase, respectively. Both phases are slightly compressible with quadratic Brooks-Corey relative permeabilities [2]. Initially at hydrostatic rest, the shallow regions of the reservoir are filled with  oil, whereas the remaining deeper regions are saturated with formation water. Seven injectors are placed along the reservoir perimeter, each injecting water at a constant rate in the range of 500-2 000 m 3 per day. The injectors encompass six producers operating at a fixed bottom-hole pressure of 200 bar. We simulate 30 years of production using timesteps that gradually increase to 100 days. To subdivide the domain, we first define a tube of consisting of all cells within a radius of 400 meters of each wellbore and merge any intersecting tubes into one subdomain. The remaining cells then are partitioned using the METIS graph partitioning algorithm [17] with face transmissibilities as edge weights (see Fig. 13). Figure 14 reports water saturation after 0.02, 4.7, and 18.4 years of injection, along with the corresponding total subdomain iterations used to solve each timestep. Notice that several subdomains require zero iterations for all the reported timesteps. Figure 15 reports the average number of nonlinear iterations per timestep with Newton and ASPEN for the fully implicit and sequential implicit solution strategies. The setup in this example results in only moderate CFL numbers, and Newton therefore performs near optimal, using approximately two iterations per timestep on average.
We nonetheless see that nonlinear preconditioning with ASPEN is able to reduce the iteration count further for all solution strategies. Moreover, the histogram of subdomain iterations in Fig. 16 reveals that 85.0% and 90.9% of the subdomain solves required two iterations or less for the fully-implict and transport solvers, respectively. Altogether, this indicates that ASPEN is also well-suited for more typical simulation setups commonly seen in reservoir engineering applications, which is arguably more important than excellent performance on challenging corner cases.

Example 5: Water-alternating gas injection
After an initial water flood, a popular tertiary recovery technique is to inject solvent gas into the reservoir to dissolve and mobilize trapped residual oil. The gas is usually injected in smaller volumes, with water injection in between the gas volumes to uphold a favorable mobility ratio. We revisit an example from [33] posed on the first layer of Model 2 from the 10th SPE Comparative Solution Project [7], which is initially filled with a mixture of carbon species (C1, C3, C6, C10, C15, and C20), all in the liquid phase. We use a three-phase model with six components plus water and assume that water is immiscible. A well in the lower-left corner injects C1 gas for 5000 days, followed by water for 5000 days, followed by C1 gas for 5000 days. Fluids are produced from a well in the upper-right corner operating at a fixed bottom-hole pressure of 275 bar. Figure 17 shows the setup and final saturation, along with a 3×7 rectilinear subdomain partition used to decompose the domain.
A recurring issue in reservoir simulation is how to select appropriate timesteps. In practice, one compromises between timesteps long enough to get the simulation done in reasonable time and short enough to not impede nonlinear convergence. Emphasis tends to be on the latter, because a timestep that does not converge within the prescribed iteration limit is reduced and restarted, resulting in a potentially large amount of wasted computational effort. In this example, we compare Newton and ASPEN for timestep targets of 25, 50, 100, and 200 days. At the beginning, and when changing from/to water to/from gas injection, we use eight timesteps that gradually increase to the timestep target, resulting in a total of 624, 324, 174, and 99 timesteps for the different setups. The Newton solver is allowed to perform a maximum of 25 iterations without converging before the timestep is halved and restarted. Figure 18 shows the C1 mole fraction at five times for the setup with finest timesteps, together with corresponding subdomain iterations. Figure 19 reports the average number of nonlinear iterations per timestep. As expected, the iteration count increases with the timestep length for both solvers: from 3.1 to 9.9 for Newton, and from 1.4 to 2.9 for ASPEN. As in the SAIGUP example, we see that ASPEN performs similar to Newton when this solver performs near optimal. Moreover, the increase in iteration count with timestep length is significantly smaller for ASPEN than for Newton.
From the subdomain iterations in Fig. 20 we observe that the distribution shifts towards the right (i.e., the subdomain solves consume more iterations) for larger timesteps: 94.2% of all subdomain solves consume two iterations or less for the setup with timesteps of 25 days, whereas the (approximate) 90% percentile is at three, four, and six iterations for the setups with timesteps of 50, 100, and 200 days, respectively. Together with the very modest increase in average nonlinear iterations reported in Fig. 19, this demonstrates how ASPEN efficiently resolves unbalanced nonlinearities locally, which makes the method very robust with respect to timestep length. This robustness is very valuable when simulating complex recovery scenarios.

Example 6: Unstructured grid
As an example of an unstructured grid, we consider an inverted five-spot pattern in a rectangular domain of 300 × 600 m represented on a 2.5D perpendicular bisector (PEBI) grid. Petrophysical properties representative of a shallowmarine reservoir are sampled from the first 35 layers of Model 2 from the 10th SPE Comparative Solution Project [7]; see Fig. 21. We use two different partitions: a simple 3 × 7 rectilinear partition and a partition strategy with cylindrical subdomains around each well and METIS for the rest of the reservoir (see Example 4). We simulate injection of water at a constant rate for 2000 days; the upper row of Fig. 22 shows three snapshots of the resulting displacement.
From the iteration counts reported in Figs. 22 to 24, we see that the nonlinear subdomain iterations follow the same trend for both partitions and that the performance of ASPEN appears to be almost invariant to partition type, albeit with a slight preference to the METIS partition, in which the subdomains are adapted to the heterogeneous rock.
As discussed in the complexity analysis, solving the linearized systems tends to be the dominant part of a nonlinear iteration. In this example, we therefore also look at linear solver performance. We use GMRES with CPR preconditioning [8] as the linear solver, both for ASPEN and Newton. Figure 23 reports average linear iterations per nonlinear iteration. On average, using ASPEN increases the number of linear iterations per nonlinear iteration by 29.7% and 22.3%, respectively. Such an increase should be expected, even if the linear system itself has the same structure as for regular Newton, because the ASPEN variable updates are necessarily larger in magnitude. Figure 25 illustrates this effect for a timestep for which Newton uses 18 nonlinear iterations, whereas ASPEN uses 5 (rectilinear) and 4 (METIS). The first two nonlinear updates of Newton consume as many linear iterations as the ASPEN solvers, while the last ones consume less. All in all, however, we can conclude that ASPEN does not seem to significantly impede linear convergence.

Concluding remarks
This work presents a highly flexible implementation of nonlinear spatial-domain decomposition preconditioning with ASPEN that can handle industry-grade compositional fluid models and general, polytopal grids. Through a series of test cases with different types of flow physics, heterogeneity, and grid types, we have demonstrated that ASPEN can be a competitive nonlinear solution strategy, especially for complex injection scenarios with unbalanced nonlinearities.
A key problem with standard Newton's method is that local convergence problems have to be resolved through expensive global iterations. (Such problems are typically encountered in the near-well region, near strong displacement fronts, in areas with strong gas expansion, when wells are opened or shut in, and in regions with high media contrasts combined with deviated cell geometries, etc.) With ASPEN, convergence problems are usually handled locally in the nonlinear preconditioning stage, which enables the method to focus the computational effort and avoid wasting iterations on converged subsets of variables. In many cases, a significant fraction of the cells remain invariant from one step to the next, typically in unswept areas for secondary and tertiary recoveries, and with ASPEN one avoids iterating these cells. As a result of the localization, ASPEN also appears to be more robust with respect to timestep lengths. Likewise, the method is almost as efficient, and in some cases more efficient than Newton's method at its best. This is important in practice, because it is generally difficult to select timesteps that are sufficiently long to ensure computational efficiency and at the same time not so long that they cause convergence problems in the nonlinear solver.
These conclusions are based on using nonlinear (and linear) iterations, accompanied by an estimate of computational costs based on a complexity analysis, as our measure of computational efficiency. This is obviously not a totally reliable metric, and a full assessment including CPU times is necessary to draw firm conclusions on performance over the standard Newton method. We are currently implementing a version of this algorithm in a C++ reservoir simulator to assess the runtime performance in practice. The observed reductions in nonlinear iterations for our prototype implementation are nonetheless significant, giving a clear indication of enhanced nonlinear solver efficiency. Moreover, our reformulation of the global linearized system enables us to leverage well-established, efficient linear solution strategies. This is particularly important for large simulation problems, for which lack of efficient linear preconditioners would render computation of the global update prohibitively expensive. In line with previous work, we also argue that ASPEN is very well suited for parallelization. In fact, with our reformulation of the global linearized system, ASPEN may be seen as a way to parallelize Newton's method itself.
Another advantage of the ASPEN framework is its treatment of wells, which are often a main cause for nonlinear convergence issues. The overall numerical strategy presented in this paper is sufficiently general to allow for various decomposition and solution options, including solving wells as a separate subdomain (akin to nonlinear facility solvers found in some simulation tools). For the test cases studied herein, we have used a slightly different strategy that assigns all perforated cells, padded by at least one layer of extra cells, to a single subdomain. (Note that there is no requirement that ASPEN subdomains are connected, so that a well subdomain can consist of multiple disconnected parts.) The rest of the domain can then be decomposed by means of a suitable graph partitioning algorithm. The drawback of this approach is that the resulting subdomain can represent a large fraction of the reservoir cells for reservoirs with high well density. Subdivision is straightforward if wells are localized and operated independently, but further research is necessary to devise good strategies for handling cases with long and complex (multibranch) trajectories and/or wells subject to group or field constraints/controls, e.g., to ensure that constraint violations are handled correctly across subdomains.
We have tested our ASPEN implementation both for fully implicit problems and for pressure and transport subproblems in sequential implicit strategies. As expected, we observe significantly higher efficiency gain for transport problems than for pressure problems. A natural extension is to use ASPEN as a nonlinear transport solver in combination with local timestepping techniques [28] and spatial adaptivity with dynamic coarsening [20,23]. As argued in [19], nonlinear transport solvers localized by optimal cell ordering [21,39] are multiplicative Schwarz methods. Our results encourage further development of such solvers as multiplicative Schwarz preconditioning techniques (MSPEN). Another possible approach is to combine ASPEN with sequential fully implicit strategies [37], and in [22] we discuss how the choice of fully implicit versus sequential implicit methods can be localized to individual subdomains with ASPEN as an overall framework.
Acknowledgements The authors thank Halvor Møll Nilsen for fruitful discussions and TOTAL E&P for funding and allowing the publication of this work.
Funding Open access funding provided by SINTEF AS.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.