Accelerating the solution of linear systems appearing in two-phase reservoir simulation by the use of POD-based deflation methods

We explore and develop a Proper Orthogonal Decomposition (POD)-based deflation method for the solution of ill-conditioned linear systems, appearing in simulations of two-phase flow through highly heterogeneous porous media. We accelerate the convergence of a Preconditioned Conjugate Gradient (PCG) method achieving speed-ups of factors up to five. The up-front extra computational cost of the proposed method depends on the number of deflation vectors. The POD-based deflation method is tested for a particular problem and linear solver; nevertheless, it can be applied to various transient problems, and combined with multiple solvers, e.g., Krylov subspace and multigrid methods.


Introduction
Solutions of systems of linear equations are required when simulating flow through subsurface porous media [27]. These linear systems emerge during the iterative solution of time-and space-discretized nonlinear partial differential equations that govern the porous media flow problems. In the case of multi-phase flow, the equations involve an elliptic or parabolic pressure equation coupled to one or more near-hyperbolic saturation equations.
There are several ways to treat this system of equations, among others, the Implicit Pressure Explicit Saturation (IMPES), Fully Implicit (FI) and Adaptive Implicit Methods (AIM) [13,25]. In this work we use the IMPES scheme implemented in MRST [30], that decouples the pressure from the saturation via the fractional flow formulation, see Appendix B and [47].
For the IMPES scheme, the saturation is calculated explicitly, while pressure is kept implicit, and solving the pressure equation is the most time-consuming part, especially for large and ill-conditioned systems, which often occur because of the complex geometry and strongly heterogeneous rock properties that constitute the porous medium. Furthermore, for time-varying problems, it is required to compute a large number of simulations, which makes the solution of these problems expensive.
Even larger computational demands occur when subsurface flow models are used in workflows for optimization or uncertainty quantification which typically require the simulation of a vast number of models. Various techniques have, therefore, been developed to improve the linear solver speed.
Among others, Reduced Order Models (ROM) methods are used to capture relevant information of a highdimensional system and to project it into a lower-dimension space [3,10,28,41,50], which is easier to solve. With these methods, essential system information can be obtained by computing a small set of basis functions from a collection of system solutions (also known as 'snapshots'). Proper Orthogonal Decomposition (POD) is a ROM method that has recently been used to accelerate the solution of the linear pressure equation resulting from reservoir simulation [24,35,40,42,48], among other applications.
For the computation of the POD basis, two main approaches are used. In the first one, a training simulation is run, and the solutions are stored as snapshots, which are collected to obtain a POD basis. This methodology is especially suitable for solving problems with small changes in the input variables, e.g. the same well configurations but different flow rates or bottom hole pressures (bhp) [35,40,48]. The basis can also be computed on-the-fly, using, e.g., the solution of the latest time steps [16,40,42]. With this approach, the basis has to be adapted during the simulation. Once the basis is obtained, various POD-based strategies can be used to solve the system. In the future, we will refer to the first approach as training phase approach, and the second as moving window approach.
For the solution of a large-scale system, Markovinovic et al. [42] proposed using POD techniques to compute a good initial guess that accelerates the iterative method for simulation of two-phase flow through heterogeneous porous media. For the same type of problems, solving the problem in the small-scale domain, and projecting it back to the large-scale system was implemented by Astrid et al. [40].
For many applications, Krylov subspace iterative methods are used [19,46] to accelerate the solution of linear systems. The speed of convergence of these methods depends on the condition number and the right-hand side (rhs) of the system. If the condition number is large, generally, preconditioning techniques are needed to transform the original system into a better conditioned one. If the system is Symmetric Positive (Semi) Definite (SP(S)D), a commonly used Krylov-subspace method is Conjugate Gradient (CG) [8,9,20,26,28]. If the system is not SP(S)D, methods like GMRES are used. These methods can be accelerated further with preconditioning [32,45], or recycling strategies like augmentation and deflation [8].
For some Krylov methods, the computation of an orthogonal basis for the entire Krylov subspace is required to obtain an approximate solution. As the number of iterations increases, the work and the storage requirements grow dramatically. To reduce the extra work, some methods are restarted after some number of iterations. For the GMRES method, system information is obtained from previous solutions or from the Ritz vectors and combined with restarting, truncation and deflation techniques to accelerate the method [11,34,43,44].
If the spectrum of the system contains few small eigenvalues, or if the initial error vector has small components for the lowest eigenvalues modes, the rate of convergence of the initial sublinear phase is more rapid. Such an initial error vector can be obtained using preconditioners based on multilevel meshes working from coarser meshes to finer ones [38]. In recent years, deflation techniques have been developed to accelerate the convergence of Krylov subspace methods [8,9,22,26,49]. Especially useful when a sequence of linear systems with constant or slightly varying matrices has to be solved [31]. For this technique to be effective, a deflation subspace needs to be found. This subspace is such that the smallest eigenvalues of the system are no longer hampering the convergence of the iterative method. The reuse of system information obtained from POD has been increasingly approached recently. Carlberg et al. [28] proposed a POD-augmented CG algorithm for Krylovsubspace recycling applied to solid-mechanics problems. Another approach was developed by Pasetto et al. [10], who suggested constructing a preconditioner for the CG method, based on a POD basis for the solution of groundwater flow models. The use of the POD basis within a deflation procedure to accelerate the CG method was introduced by Diaz Cortes et al. [16] for single-phase flow simulation problems.
In this work, we extend the work of Diaz Cortes et al. [16] to two-phases, and we introduce the capture of information via POD methods with a training phase approach besides the moving window approach proposed in [16]. The acquired information is used for the construction of the above-mentioned deflation subspace. We explore the applicability of this methodology for the simulation of two-phase flow in large-scale, highly-heterogeneous porous media.
In Section 2, we give a brief overview of the methods used to solve linear systems. Section 3 is devoted to numerical experiments, where we give some examples and present some results. Finally, we formulate the conclusions. The governing equations used for the simulation of twophase flow problems, together with the discretization schemes used in this work can be consulted in Appendices A and C.

Solution methods for linear systems
Iterative techniques are preferred over direct methods to approximate the solution of ill-conditioned, large and sparse linear systems. In this section, we give a brief overview of the Krylov iterative methods, used to accelerate the solution of this kind of systems. We also introduce the most popular acceleration techniques for Krylov subspace methods: preconditioning, augmentation, and deflation. Finally, we introduce the POD method that together with preconditioning and deflation are used for the acceleration of the CG method throughout this work.

Krylov subspace Given a linear system
with A ∈ C N×N , the initial residual r 0 := b − Ax 0 , where x 0 ∈ C N is an initial guess of x, and k is a positive integer. We define the k-dimensional Krylov subspace as: For Krylov iterative methods, the solution can be obtained with the iteration procees presented below, for the k − th iteration we have:

Conjugate gradient (CG) method
The CG method is a Krylov subspace method used for Symmetric Positive (Semi) Definite matrices (SP(S)D). For this method, the convergence bound is given by: where the condition number . As A is SPD, κ 2 (A) = λ max (A) λ min (A) . Typically, the convergence of the Krylov subspace methods takes place in three phases with different rates of convergence. In the first phase, the CG method lacks information about the spectrum, and acts mainly on the information of the initial iteration error, leading to a short but rapid initial convergence rate, also known as the sublinear phase. Later, the influence of the spectral condition number becomes larger and the method enters the linear phase. Finally, when a sufficient number of extreme eigenvalue components have been damped out, the method enters the so called superlinear convergence phase [4,6,38].
Acceleration of the Krylov subspace methods can be achieved for the three stages. By improving the initial guess so that the error is minimized, by changing the condition number of the system, or by removing the influence of the smallest eigenvalues of the system matrix. There are several methods to achieve acceleration. Next, we present some of them.
Preconditioning To accelerate the convergence of Krylov subspace methods, the linear system is multiplied by a matrix M −1 , such that the iteration matrix has a better spectrum and M −1 is cheap to compute. The resulting preconditioned system is: The convergence bound of the preconditioned iterative method is given by: where the condition number of the preconditioned system is smaller than the one of the original system, i.e., For the CG method, a commonly used preconditioner is the Incomplete Cholesky (IC) factorization, given by A = C T C + R, with R the error matrix of the factorization. In this work, as preconditioner, we use the IC factorization of order 0, meaning that the non-zero entries of the C matrix are the same as the non-zero entries of the system matrix A. We refer to the Conjugate Gradient method preconditioned with the IC factorization as ICCG.

Augmentation and deflation techniques for Krylov methods
Even after preconditioning, the spectrum of the preconditioned system matrix can contain few small eigenvalues that tend to slow down the convergence rate of the Krylov subspace methods. Therefore, it is required to use techniques that treat these eigenvalues. Among others, deflation and augmentation methods are used to remove the influence of these eigenvalues [12,37,39].
As the iteration of a Krylov method proceeds, the Ritz values converge, i.e., extreme eigenvalues of the system matrix are approximated by the corresponding Ritz values. When this happens, the method behaves as if the corresponding eigenvectors are no longer present [4,21]. Thus, it is necessary to perform some iterations before the Ritz values are sufficiently approximated, and the influence of the corresponding extreme eigenvalues is eliminated.
Augmented Krylov subspace methods Aim to add information about the problem into the search space by including into the Krylov subspace an augmentation space U that, otherwise, could be obtained only after some Krylov iterations [39]. With these methods, the usual search space K k A, r 0 is replaced by the augmented Krylov subspace, given by: Hence, the solution is found with the following iterative process:

Deflated Krylov methods
On the other hand, instead of the inclusion of the augmentation space U on the search space, the iterate is constructed, so that the residual r k satisfies the orthogonality condition given by: with P being the deflation operator that usually contains information of the extreme eigenvalues. This approach is known as deflation approach. With deflation, the effect of the small eigenvalues can be annihilated by setting them to zero, such that the condition number of the system is reduced before the Krylov iterative process. The method is consequently accelerated, and the superlinear convergence phase is reached earlier [1,2,8,22,29]. A formal definition is presented next.
Definition 1 Let A ∈ R n×n be an SP D matrix, and Z ∈ R n×p be a full rank matrix. The invertible Galerkin matrix, E ∈ R p×p , the correction matrix, Q ∈ R n×n and the deflation matrix P ∈ R n×n are defined as [26,29,49]: where Z ∈ R n×p is called the def lation − subspace matrix, and its columns are the def lation vectors or proj ection vectors.
A good selection of the deflation vectors is usually problem-dependent, and available information of the system is, in general, used to construct them. Most of the techniques used to select deflation vectors are based on eigenvectors or approximated eigenvectors, recycling vectors [16,33], subdomain deflation vectors [9] or multigrid and multilevel based deflation vectors [7,18,26]. A description of some of these choices of deflation vectors is presented next.

Eigenvectors of the system matrix as deflation vectors
If the matrix Z contains eigenvectors corresponding to the most unfavorable eigenvalues of the system matrix A, the deflation method set them to zero and convergence of the iterative method is achieved faster. If the deflation-subspace matrix, Z is chosen as p eigenvectors, v j , of A, i.e., Z = v 1 , . . . , v p , then PA has the same eigenvectors as A and the spectrum is given by [14,49]: However, the eigenvalues are usually unknown and it is costly to obtain them. Thus, approximate eigenvalues are used instead.

Recycled vectors as deflation vectors Given a linear system
where the x i s are linearly independent (l.i.) solutions such that Ax i = b i , and any other solution can be constructed as a linear combination of this solutions. These solutions can be used to construct the deflation subspace matrix Using this choice of deflation vectors, the solution of the linear system is obtained within one iteration of DCG (see the proof in [14,16]).
Hence, if the unfavorable eigenvalues are captured in the l.i. solutions x i and x j , the behavior of the deflated method is the same as if these eigenvectors are used as deflation vectors.

Preconditioned deflated conjugate gradient method (DPCG)
Deflation techniques are usually combined with a preconditioner M −1 to achieve a larger acceleration of iterative methods. When using these techniques to accelerate the CG method, this results in the DPCG method. The pseudocode of the DPCG method is given in Algorithm 1. In this work, we implement the Deflated Preconditioned Conjugate Gradient method with the Incomplete Cholesky factorization as preconditioner; we refer to this method as DICCG.
For this method, the error is bounded by: is the effective condition number. As mentioned before, deflation sets the smallest eigenvalues to zero; then, these eigenvalues are not longer taken into account for the iteration process. Instead, λ min M −1 PA , the smallest non-zero eigenvalue of M −1 PA, is used to study the convergence.
This solution method is studied throughout this work, using an Incomplete Cholesky factorization as preconditioner. As deflation vectors we use Recycled vectors in two ways: solutions of previous time steps as deflation vectors, and a Proper Orthogonal Decomposition (POD) basis as deflation vectors. This basis is obtained from previous time step solutions. To illustrate how this POD basis is constructed, next, we introduce the POD method.

Proper orthogonal decomposition (POD)
The POD method is a Model Order Reduction (MOR) method, where a high-order model is projected onto a space spanned by a small set of orthonormal basis vectors Ψ = ψ 1 ψ 2 .. ψ p , Ψ ∈ R n×p . The basis vectors ψ i ∈ R n are computed from a set of s 'snapshots' {x i } s i=1 , obtained by simulation or experiments [42]. The vectors ψ j p j =1 are p eigenvectors corresponding to the p largest eigenvalues λ j p j =1 of the data correlation matrix R ∈ R n×n , In some cases, the covariance matrix R is used instead of R, this matrix is defined as x i is the mean of the snapshots. In this work, we also normalize the snapshots, i.e., we use the following relation: If the system is large, the matrix R is also large, and to compute the eigenvalues can be costly. However, it is not necessary to compute the eigenvalues of R = XX T ∈ R n×n , but instead, it is possible to compute the eigenvalues of the much smaller matrix C = X T X ∈ R s×s , s << n. To do so, we perform the Singular Value Decomposition (SVD) of C = VDV T . Here V ∈ R s×s are the eigenvectors of C, and diag(D) are the corresponding eigenvalues, which are the same as the eigenvalues of R. The eigenvectors of R can be obtained from [17]: Once the basis is computed, the high dimensional variable x ∈ R n is approximated by a linear combination of p orthonormal basis vectors [40]: The p eigenvectors are chosen such that they contain almost the whole variability of the snapshots. Usually, they are the maximal number of eigenvalues satisfying [42]: with α close to 1. The eigenvalues λ j are ordered from large to small with λ 1 being the largest eigenvalue of R.
Once the basis Ψ is obtained, the linear system from Eq. 1 is projected onto the subspace spanned by the basis [40] as follows: leading to the reduced model: The reduced model is dense; however, it is much smaller than the original system and can be solved efficiently with direct methods.

POD-based deflation method
In this work, we further explore and develop the POD-based deflation method introduced in [16]. Instead of solving the reduced model, as in [28,35], we propose the reuse of a POD basis, Ψ , as the subspace-deflation matrix, Z, in a deflation procedure. With this method, the residual is projected into the deflated space constructed with few elements of the basis Ψ , as presented in Eq. 8. The approximate solution is obtained from the Krylov subspace directly (K k ) for the full system.
The problems we study contain a few small eigenvalues of the system matrix A t that cause slow convergence. Deflation methods can remove these eigenvalues if the corresponding eigenvectors are used. As shown in Eq. 3, for the case of the CG method, the convergence depends on the ratio of the largest (λ max ) to the smallest (λ min ) eigenvalues. Thus, when the smallest system eigenvalue (λ min ) is 'removed' from the system, the convergence depends on the "effective" condition number, given as the ratio between the largest (λ max ) and the second smallest (λ min−1 ) system eigenvalues [4,5].
A POD basis, ideally, can get the main system information from a set of snapshots, X. In particular, the basis should be an approximation of the system eigenvectors. As an example, we take as snapshot an eigenvector of the system matrix Av k = λ k v k , X = [v k ]. Then, the correlation matrix is given by which implies that Ψ = [v k ], and using this basis in a deflation procedure, the corresponding eigenvalue λ i is removed from the system. Thus, if the eigenvalues of the system matrix A hampering the performance of an iterative method are captured on the POD basis, they can be removed using this basis as deflation vectors. Taking a snapshot x i as in Lemma 1, if it contains information about one eigenvalue, i.e., x i ∼ v i , a basis obtained from it will be the same as before, i.e., Ψ = [v k ]. When used as a deflation vector, this eigenvalue will be removed from the system.

Methodology
In this work, we study the performance of the Deflated Preconditioned Conjugate Gradient method preconditioned with Incomplete Cholesky (DICCG) and a POD basis as deflation vectors. For this method, the complete solution strategy consists of three stages: i) Snapshots collection. A set X of snapshots, vectors b i with different configurations or at various time steps, is obtained for the computation of the POD basis. To capture the snapshots, we use two approaches. The first one is a moving window approach, and the second one is a training simulation approach, both of them used for non-invariant systems, These approaches are further explained below.
Moving window approach: the window consists of a set of s snapshots updated for each time step.
The snapshots are captured 'on the fly,' i.e., they are the most recently computed solutions, obtained during the previous time steps. Given a time step t, we collect the previous s snapshots (from t − s − 1 to t − 1). These snapshots are given by: With this approach, the first t snapshots cannot be computed for the first time steps with the DICCG method, as we lack a deflation subspace matrix Z. Instead, the first s time steps are computed with the ICCG method. The rest of the snapshots are obtained with DICCG. Training phase approach: a full pre-simulation is run. For a pre-simulation consisting of n time steps. The set of snapshots is given by: During the training phase, the bottom hole pressures (bhp) of the production wells P is varied randomly, and it takes values between P 1 and P 2 , which implies that the right-hand side b t is constantly changing during the simulation.
The solutions of the pre-simulation are obtained with the ICCG method.
ii) POD basis computation.The previously obtained snapshots are used to construct a POD basis ( ). The algorithm of the construction of the basis is presented below, Algorithm 2. iii) Solution of the linear system. The POD basis is used as subspace-deflation matrix (Z) in a deflation procedure for the solution of the linear system. The algorithms of the complete procedure are presented in Algorithm 3 for a moving window approach, and Algorithm 4 for a training phase approach.
To analyze the method's performance, we compare the total number of iterations and operations necessary to run the DICCG simulation with the total number of iterations and operations necessary to solve the same problem using ICCG.
As tolerance or stopping criterion we use the relative residual, defined as the 2-norm of the residual of the k th iteration divided by the 2-norm of the right-hand side of the preconditioned system, throughout this work, we refer to this term as rr k . The tolerance of the solvers is presented for each case. For some experiments, the true relative error is computed as: where x is the 'true' solution, that in this case is the solution computed with backslash from Matlab, and x k is the approximation computed with ICCG or DICCG.
Complexity Taking p as the number of deflation vectors, m as the sparcity of the matrix, i.e., the number of nonzero diagonals, and s as the number of snapshots, the approximate number of operations to solve a system with n unknowns is given in Table 1.
Besides the initial work the DICCG if a POD basis is used, the cost of the computation of the basis Ψ is 4p 2 + s − 1 n + 8s 2 − 2s + 1 s. More details about the Table 1 Complexity of the methods

Method
Initialization Iteration computation of operations can be found in [14,15,17,23] and Appendix D.

Numerical experiments: results and discussion
In this work, we focus on the solution of systems of linear equations resulting from water flooding for immiscible fluids, oil, and water, flowing through highly heterogeneous 2D and 3D reservoirs. To decouple pressure from saturation, we use the fractional flow formulation (see Appendix A). We solve the system with sequential schemes using the Matlab Reservoir Simulation Toolbox (MRST) [30]. The solution of the resulting transport equation is obtained with MRST using implicit schemes, and the linear pressure system is solved with the proposed POD-based DICCG methodology and compared to the ICCG method. We study various stopping criteria for the solution methods: = 5 · 10 −5 , = 5 · 10 −7 , and = 5 · 10 −9 .
The models studied in this work are: an 'academic' layered reservoir with a contrast in permeability coefficients up to 10 6 , and the SPE 10 benchmark with a contrast in permeability coefficients of O(10 7 ) [36]. We include capillary pressure and gravity terms in some of the studied cases.

Heterogeneous permeability layers
In this section, we study water injection into an 'academic' system consisting of equally-sized layers with a constant porosity field of 0.2 and different permeability values (see Fig. 1). A set of layers with permeability σ 1 = 1 mD is followed by layers with permeability σ 2 = 10 6 or 10 1 mD.  Oil Units The contrast between permeability coefficients is given by The domain consists of a Cartesian grid of 35 x 35 cells for a 2D case, and 25 x 25 x 25 cells for a 3D case of length 8 m. The fluid properties are presented in Table 2.
The first set of experiments does not consider gravity and capillary pressure terms. Later, we include capillary pressure using the relationship p c = C(1 − S), and the 3D problem includes gravity terms. The relative permeability model used is the Corey model, with exponents n w = n nw = 2 (see Table 2). The studied test cases are: TC1. c σ = 10 −1 , no capillary pressure terms included. TC2. c σ = 10 −6 , no capillary pressure terms included. TC3. c σ = 10 −1 , capillary pressure terms included. TC4. c σ = 10 −6 , capillary pressure terms included.
We study water flooding with injection through the boundary and wells. When using wells, the setup consists of four producers, P i , on the corners and one injector, I , in the center. The wells are controlled by prescribing the bottom hole pressure bhp values.

Injection through the left boundary
The injection is performed through the left boundary at a rate of 0.4 m 3 /day for the 2D case and 4 m 3 /day for the 3D case. The pressure is set as zero at the right boundary and the initial pressure of the reservoir is 100 bars (See Table 3). The simulation is run for 480 days with time steps of 1 day (see Table 3). We study the deflation procedure in a moving window approach (MW), with two different selections of deflation vectors, p. For the first one, we use the ten previous time step solutions, p = 10, represented by DI CCG 10 . For the second  The pressure field and oil saturation are presented in Figs. 2 and 3. We observe that the pressure is higher on the boundary where water is injected, and it decreases towards the right boundary. We also observe that the layered pattern influences the pressure and saturation fields.
In Fig. 4, we present the eigenvalues of a) the system matrix, b) preconditioned system matrix, deflated and preconditioned system with c) ten snapshots, and d) five POD basis vectors as deflation vectors. We observe that for the cases with a larger contrast in permeability coefficients (TC2 and TC4) the spectrum of the system is more spread, which implies that the system is more ill-conditioned. We note that after preconditioning, the spectrum is clustered, but there are few eigenvalues smaller than the rest.
For the ill-conditioned systems (TC2 and TC4) these few eigenvalues are smaller than in the other cases (TC1 and TC3), slowing down the convergence of the method. After deflation, these small eigenvalues are set to a value close to zero (10 −16 in machine precision). We can note that using (c) ten snapshots or (d) five POD-based deflation vectors, the smallest eigenvalues of the preconditioned system are set to zero, leading to a similar spectrum (without the removed eigenvalues). This implies a similar performance with both selection of deflation vectors.
The number of iterations necessary to achieve convergence is summarized in Table 4, where the first column shows the test case. In the second column, we present the number of deflation vectors used, p. The third one shows the number of iterations necessary to achieve convergence with the ICCG method only. Note that this number is the same for each test case, i.e., only the number of DICCG iterations and work change if a different number of deflation vector is used.
The number of iterations necessary for convergence of the deflation method is presented in the sixth column. For this example, we use the Moving Window (MW) approach. Therefore, the first p time steps are computed with the ICCG method (fourth column), and the rest of the time steps are computed with the DICCG method (fifth column). Then, we compute the percentage of DICCG iterations with respect to the total number of ICCG iterations. In the last column, we present the percentage of ICCG work required to perform the simulation. The total work of each method is computed as the number of iterations times the number of operations per iteration, plus the initial work and the extra work required to compute the POD basis.
In Table 4, we observe a significant reduction in the number of iterations and the work required when using the DICCG method. For the 2D problem, test cases TC1, TC2, the number of iterations is reduced to 14% and 12% the number of ICCG iterations when using ten snapshots as deflation vectors, and to 19% and 16% when using five POD-based vectors as deflation vectors. For the same test cases, the work required to perform the simulation is reduced to around 33% and 28% for ten deflation vectors and to 61.7% and 47.6% using five, respectively. Thus, the TC2 case presenting a contrast between permeability layers  , cases a TC1, b TC2, c TC3, d TC4  Fig. 4 Eigenvalues of a the system matrix, b preconditioned system matrix, deflated and preconditioned system with c ten snapshots, and d five POD basis vectors as deflation vectors of 10 −6 and no capillary pressure terms showed a slightly better performance.
If capillary terms are included (TC3 and TC4), the percentage of ICCG iterations and the total work increase.
For these cases, the reduction in the number of iterations is similar for DICCG 10 and DICCG P OD 5 methods, which implies that almost the same information that is contained in the ten snapshots is captured in the five POD basis vectors. The cases TC1 and TC3 do not contain capillary pressure terms, while the TC2 and TC4 cases contain these terms. The sub-index P OD in the p column means that the deflation vectors are POD basis vectors, if no sub-index is included, it means that the vectors are snapshots However, computing the POD basis requires more work. Using ten snapshots, the number of iterations is reduced to 27 and 20% the number of ICCG iterations, and to 64 and 48% of the ICCG work. If five POD vectors are used, the reduction is 29 and 21% the number of ICCG iterations, and to 79 and 55% of the ICCG work. We observe similar trends for the 2D and 3D cases, the last one, including gravity terms. However, the 3D cases are more expensive in terms of the number of iterations and the total work than the 2D cases.
In Fig. 5, we show the total number of operations required to perform the work during the iteration process and the extra work. In the case of the DICCG P OD 5 method with five POD-based deflation vectors, the extra work is due to the computation of the deflation vectors. The ICCG and the DICCG 10 methods do not have extra work involved as the ICCG method does not compute any deflation vectors, and the DICCG 10 method reuses the previously computed solutions. The initial work is negligible for all cases. Thus, it is not included in the plots.
Although the work required to perform the iterative process for the DICCG P OD 5 method is less than the work required by the DICCG 10 method (see Fig. 5), the computation of the POD basis increases the number of operations considerably for the DICCG P OD 5 method, especially for the 3D case. Figure 6 shows the relative residual (17), and Fig. 7 presents the true relative error (18) for the 240-th time step for the studied methods, similar performance is observed for the rest of the time steps. We note that the superlinear convergence phase is reached almost immediately for the deflation methods, contrary to the ICCG method that requires some iterations before reaching this phase.  We also note that the true relative error is smaller than the relative residual in most of the cases. Furthermore, after a few DICCG iterations, an accuracy of O(10 −8 ) is reached. If the contrast between permeability coefficients is larger (TC2 and TC4), i.e., more ill-conditioned problems, the true error is O(10 −8 ), after one, and three iterations, respectively.
As mentioned before, the first iterations have an accuracy of O(10 −4 ), thus, we study a case with a less strict tolerance, = 5 · 10 −5 . Table 5 presents the number of iterations for the ICCG and DICCG methods together with the percentage of the ICCG iterations and work required when using the DICCG method for a tolerance of = 5 · 10 −5 .
We observe that the reduction in the number of iterations when using the deflated methods ranges from 7-21% of the ICCG iterations, which is larger than when a more strict stopping criterion ( = 5 · 10 −7 ) is required. Similarly, the reduction of the ICCG work is smaller, ranging from 17.5-68% for the 2D case.
The work required to perform the iteration process and the extra work for this problem is presented in Fig. 8 for all methods, the initial work is negligible in all cases, thus, it is not presented.
We observe that the largest reduction in work is achieved for the DICCG 10 method in all the cases. A reduction of the ICCG work is achieved when using the DICCG P OD 5 for all the cases in the 2D problem, but only for TC1 and TC3 for the 3D problem. The ICCG work required for the TC2 and TC4 is small and therefore, the cost of computing the POD basis increases the total work of the DICCG P OD 5 method.
We observe that for the DICCG 10 method, the maximum number of iterations is reached (150). This implies that the Fig. 8 Total work for various methods, = 5 · 10 −5 , Left: 2D cases, Right: 3D cases. The initial work is negligible in all cases method does not converge. The set of ten deflation vectors used are the ten previous solutions, and linear independence is not guaranteed for this set. This implies that computing the Galerkin matrix E is not possible, and convergence of the deflation method is not guaranteed.
We also note that for the TC2, 3D case, the reduction in the number of iterations is larger if we use five POD deflation vectors. To understand this behavior, we plot the number of iterations for all time steps in Fig. 9.
On the other hand, the POD basis is linearly independent by construction. Therefore, using vectors of this basis as deflation vectors (DICCG P OD 5 ) does not present the same problems as the recycled snapshots (DICCG 10 ). As a consequence, the linear independence of the POD results in a more robust deflation method.
Note that, for the methods used in this work we need an initial guess. When solving the first time step, this initial guess is not necessarily close to the solution, hence, it needs a larger number of iterations to converge. For the following time steps, the initial guess is the solution of the previous time step, and as the system does not change dramatically from one step to the next one, the solution is reached faster than the initial time step.

Discussion
We observe a reduction in the work and the number of iterations required to simulate waterflooding in a layered reservoir. If the contrast between permeability layers increases, the reduction is more significant. This latter observation shows the better performance of the PODbased deflation method for ill-conditioned systems.
We also observe that using ten snapshots or five POD basis vectors as deflation vectors gives a similar reduction in the number of iterations. However, the reduction in the work is larger if ten snapshots are used as deflation vectors due to the cost of computing the POD basis. Nevertheless, using the POD basis (DICCG P OD 5 ) avoids non-convergence problems that the DICCG method might present when using only snapshots (DICCG 10 ).
Including capillary pressure terms slightly deteriorates the performance of the methods. This behavior might be due to the fact that including capillarity pressure terms reflects on changes on the right-hand side, which is not effectively treated by the deflation method. However, more studies are required to confirm this hypothesis.
We note that after a few iterations, the residual is of order 10 −4 . Thus, using a less strict stopping criterion (10 −5 instead of 10 −7 ) results in less work of the DICCG method compared to the ICCG method. However, this is only the case for the DICCG 10 method. As a less strict criterion is used, the ICCG method also requires fewer iterations, and computing the basis for the DICCG P OD 5 method leads to a more costly method.

SPE 10 benchmark
The SPE 10 model consists of 60 x 220 x 85 cells. In this work, we study a 2D case containing the top layer and a 3D case with the first 36 layers, including gravity terms. For the problem with one layer, the system matrix has a condition number of O(10 7 ), while for the 36 layers, the condition number is of O(10 8 ). As the convergence of the methods depends on the condition number, to achieve similar results, we impose a more strict stopping criterion for the 3D case. We study cases with and without capillary pressure terms: TC1. No capillary pressure terms included. TC2. Capillary pressure terms included.
We consider injection through the boundary and injection through wells. The wells' setup consist of one injector and four producers (see Fig. 10). We compare the performance of the POD-based deflation method against the standard ICCG method for the solution of the pressure equation.
Application of the DICCG method requires the construction of a deflation matrix based on a POD basis obtained from a series of snapshots selected in two ways: a moving window, and a training phase approach (see Section 2).
For the training phase approach, the POD basis and deflation matrix are obtained off-line in a training run solved with the ICCG method. Then, a series of simulations are performed using the DICCG method with diverse values of bottom hole pressure, bhp, in the producers. The fluid properties and the capillary function used for this problems are the same as in Section 3.1.1 (see Table 2).

Injection through the left boundary, moving window approach
Water is injected through the left boundary at a constant rate of 600 m 3 /day to a reservoir initially filled with oil. The initial reservoir's pressure is 500 [bars], and the pressure at the right boundary is set to zero [bars]. We simulate 480 time steps with a step size of 15 days for the 2D case. For the 3D case, we use 36 layers, and we simulate 240 time steps with a step size of 1 day and an injection rate of 1000 m 3 /day (see Table 6).
For the DICCG method we use the moving window approach (MW), selecting ten snapshots (from previously computed time steps) and five POD basis vectors as deflation vectors for the 2D case. The pressure field, the water saturation and the eigenvalues of the correlation matrix are presented in Fig. 11 for the 240-th time step.
Regarding the eigenvalues, we observe that when we include capillary pressure terms (TC2) they are larger, which indicates that they still contain important system information, and therefore, more eigenvectors are required to obtain a better performance of the deflation method.
The number of iterations required to achieve an accuracy of = 5 · 10 −7 for the 2D case, and = 5 · 10 −9 for the 3D is presented in Table 7 for the ICCG and DICCG methods. We observe a reduction to 11% of the number of ICCG iterations and to 24.5% the total ICCG work when using ten snapshots as deflation vectors DICCG 10 . If five  POD-based deflation vectors are used DICCG P OD 5 , the number of ICCG iterations is reduced to 12%, and the total ICCG work to 30.4%. If we include capillary terms, we observe an increment in the number of iterations, and, consequently, in the total work. The percentage of ICCG iterations increases to ∼19% and the ICCG work to ∼43% for both cases. As we mentioned before, more eigenvectors are required for the TC2 to gather the same information as in TC1, this is reflected in a better DICCG performance for the TC1.
For the 3D problems, we note that we reduce the number of iterations and the total work significantly. For the TC1, we achieve a reduction of ∼27% of the ICCG work when using 10 snapshots as deflation vectors (DICCG 10 ), and ∼35% when using 5 POD vectors (DICCG P OD 5 ). For the TC2, the reductions in work are ∼72% for DICCG 10 , and ∼66% for DICCG P OD 5 .
In Fig. 12, we show the work required to perform each part of the simulation for the 2D case. The initial work for all methods is negligible compared with the rest of the work, thus, it is not included. The only method that requires a computation of extra work is the DICCG P OD 5 method with five POD-based deflation vectors. For this method, the extra work is the computation of the basis (see Appendix D).
We note that the work performed during the iterative process is smaller when using five deflation vectors (DICCG POD 5 ). However, taking into account all the operations involved, using five POD vectors or ten solutions as deflation vectors The cases TC1, do not contain capillary pressure terms, while the TC2 cases contain these terms. The subindex P OD in the p column means that the deflation vectors are POD basis vectors, if no sub-index is included, it means that the deflation vectors are snapshots Fig. 12 Total work for various methods, = 5 · 10 −7 , SPE layers, Left: 2D cases, Right: 3D cases. The initial work is negligible in all cases requires a similar number of operations for the TC2, and slightly more for the DICCG P OD 5 method for the TC1. The relative residual of the studied methods is shown in Fig. 13 for the 240-th time step for the 2D case, similar performance is observed for the rest of the time steps. We note that the first iteration gives a significant reduction for all methods. However, for the ICCG method, the reduction rate is small, while for the DICCG method, in both cases, this rate is larger and the desired accuracy is achieved faster. This behaviour implies that the superlinear convergence phase is achieved for the DICCG method after the first iteration.
The true error is presented in Fig. 14. Here, we observe that after the first iteration, a good approximation is achieved for the DICCG method, O(10 −5 ), but the ICCG method takes longer to achieve the same accuracy. Thus, the relative residual does not show the behaviour of the true error for the ICCG method.
These results show that the most important system information is contained in the deflation vectors and, therefore, convergence is achieved in a small number of iterations. If the desired accuracy is small, e.g., 10 −5 , only one iteration is required. This results are similar to the results obtained for the layered problems.
In the next section we compare the approach used here (MW) with the training phase (TP) approach for a case where water is injected through a well located in the center of the reservoir.

Injection through wells, MW and TP approaches
In this section, we perform a series of experiments injecting water through a well located in the center of the reservoir with a prescribed rate of injection. The fluids are produced in the corner wells, where we prescribe the pressure (see Fig. 10), i.e., we give the bottom hole pressures (bhp) of the wells.
For the moving window approach (MW) examples, the pressure is kept constant for all wells during all the time steps. For the training phase (TP) approach, first, we perform a complete simulation (training run) with variable pressures in the wells. The solutions obtained from the training run are used to obtain the POD-based deflation vectors. These vectors are the deflation vectors required to implement the DICCG method. We solve a series of problems with constant wells pressures.
We solve the first layer (2D case) and the first 36 layers (3D case) of the SPE 10 benchmark. For the MW approach, we select the ten previous solutions and five largest POD basis vectors as deflation vectors. For the TP approach, we select ten and five of the largest POD basis vectors as deflation vectors.
Moving window approach (MW) For this case, the pressure inside the wells is set at 275 [bars] in the producers and 1100 [bars] in the injector. The test cases of this approach are referred to as TC1 MW when it has no capillary pressure    The pressure field and the water saturation are presented in Fig. 16. The eigenvalues of the correlation matrix for the 240-th time step of the MW approach, and for the training phase run are presented in Fig. 17.
For the MW approach, we observe that the eigenvalues are similar for the two studied cases (TC1 MW and TC2 MW ). This indicates that their corresponding eigenvectors contain comparable information, and we expect a similar behaviour. For the TP approach, we note that the eigenvalues of the TC4 T P (including capillary pressure terms) are smaller, hence, they contain less information. Therefore, we require more eigenvectors to get similar performance as in the TC3 T P (no capillary terms included).
The number of iterations required to obtain an approximate solution with an accuracy of = 5 · 10 −7 for the ICCG and DICCG methods is presented in Table 8 for the MW and TP approaches for the 2D case. Note that (see Section 2.4), for the MW approach we use ten deflation vectors and five POD deflation vectors computed during the simulation. However, a single POD basis is obtained  The cases TC1, TC3, TC5 and TC7 do not contain capillary pressure terms, while the rest of the cases contain these terms. The sub-index P OD in the p column means that the deflation vectors are POD basis vectors, if no sub-index is included, it means that the deflation vectors are snapshots off-line, and this basis is used for all the test cases of the TP approach (TC3 T P -TC8 T P ).
In Fig. 18 we present in a) the total work required for each part of the methods. Note that, there is extra work only for the MW approach with five POD vectors, for the TP run the deflation vectors are computed offline in a training run. In b), the percentage of total ICCG work is presented for the DICCG method with five and ten deflation vectors. We note that the MW approach gives a larger reduction in the number of ICCG iterations, around 10% for the 2D cases (TC1 MW and TC2 MW ). As a consequence, the reduction of the total work is also larger for this case. We also note that, using five POD-based deflation vectors for the MW case requires more work than using ten, due to the extra work required to compute the basis (see Fig. 18).
For the TP approach, we note that the results are similar for the three cases. This result implies that the basis computed from the training phase where the bhp of the production wells is varied in a range can be used to solve problems with different bhp of the production wells inside the range TC3 T P and TC4 T P (275 [bars]), in the limits TC5 T P and TC6 T P (200 [bars]), and outside the range TC7 T P and TC8 T P (400 [bars]). Furthermore, we note that the reduction of the work is slightly larger when using ten deflation vectors.
The relative residuals and true relative errors of the studied methods are presented in Figs. 19 and 20. As in the previous cases, we observe a good approximation O(10 −4 ) after the first DICCG iteration for the relative residual, similar to the true relative error in all the studied cases. Regarding the ICCG method, even if the relative residual appears to be O(10 −6 ) the error is 10 −4 for the MW approach (TC1 MW and TC2 MW ) and 10 −6 for the TP approach (TC3 T P and TC4 T P ). Thus, they do not reach the desired accuracy (10 −7 ), however, the error of the TP approach it is closer to the desired accuracy than the MW approach.
We also note a faster convergence rate for DICCG methods compared to the ICCG method during the first iterations, i.e., the superlinear convergence phase is achieved during the first iterations for the DICCG method.
The 3D cases are presented below: For these cases (see Table 9), the reduction on the number of ICCG iterations and ICCG work is smaller than for the 2D cases. Implementing the MW approach (TC9 MW and TC10 MW ), the reduction on the ICCG work is ∼37% when using ten snapshots as deflation vectors (DICCG 10 ), and ∼44% using 5 POD-based deflation vectors (DICCG P OD 5 ). When implementing the TP approach (TC11 T P and TC12 T P ), the reduction on the ICCG work is ∼45% when  The cases TC9 and TC11 do not contain capillary pressure terms, while the TC10 and TC12 cases contain these terms. The sub-index P OD in the p column means that the deflation vectors are POD basis vectors, if no sub-index is included, it means that the deflation vectors are snapshots using ten POD-based deflation vectors (DICCG P OD 10 ), and ∼72% using 5 POD-based deflation vectors (DICCG P OD 5 ), making this approach more expensive, specially when using 5 POD based deflation vectors.
Discussion As in the previous section, we observe an important reduction on the the number of ICCG iterations when using a deflated version of this method, during the simulation of waterflooding for the SPE 10 benchmark, for the 2D case. Furthermore, the first iteration leads to a reduction of ∼O(10 −4 ). A similar performance is observed for cases with and without capillary pressure terms. However, less work is required for the cases without capillary pressure terms. The MW approach results in a better performance of the DICCG method. However, it does not reach the desired accuracy for some test cases, but it gives a better approximation than the ICCG method. Regarding the TP approach, a single basis computed offline is used to solve diverse problems, resulting in similar performance for all of them. For the 3D case, the reduction on the % of ICCG iterations is smaller, and as consequence the reduction on the total work is also smaller.
For the TP cases, the best performance is observed using ten POD basis vectors as deflation vectors.

Conclusions
In this work, we study the POD-deflation method for twophase reservoir simulation problems. We test this methodology for several cases presenting a contrast in permeability coefficients up to O(10 8 ) and including capillary pressure and gravity terms. We study two different approaches: Moving Window (MW) and Training Phase (TP), that differ in they way the snapshots are collected.
We compare the DICCG (Deflated Conjugate Gradient preconditioned with Incomplete Cholesky) method with the ICCG (Conjugate Gradient preconditioned with the Incomplete Cholesky factorization) method, and we found that: -The performance of the DICCG method depends on the information contained in the POD basis. It is mainly related to the eigenvalues of the correlation matrix.
If only a few eigenvalues are noticeably larger than the rest, they contain most of the system information leading to better DICCG performance. -Using the POD basis as deflation vectors requires some extra work, however, this ensures linear independence of the vectors, making the deflation method more robust. -The total work is reduced up to 23% the ICCG work when using the DICCG method, for the 2D case, and up to 35% for the 3D case. -The MW approach showed a better performance than the TP approach for the studied cases. -A slightly better performance is observed if no capillary pressure terms are included. -For the TP approach, if a single basis, computed offline is reused for alike problems, a similar performance of the DICCG method is observed. Hence, once a basis is computed, it can be used to solve several problems. -In some cases, the required accuracy is not achieved for both methods. However, the DICCG method is slightly more accurate. Thus, the DICCG method is more robust than the ICCG method for the studied problems.
We introduce this methodology for reservoir simulation examples, but it can be adapted to any time-varying problem. Moreover, we test it using the Preconditioned Conjugate Gradient method, but it is independent of the method, and it can also be implemented in combination with other iterative methods or preconditioners like multigrid.

Appendix A: Two-phase flow through porous media
For the simulation of two-phase (oil-water) flow through a porous medium, we can consider the phases as separated, i.e., they are immiscible and there is no mass transfer between them. We usually consider one of the fluids as the wetting phase (w), which is more attracted to the mineral particles than the other phase, known as the non-wetting phase (nw). In the case of a water-oil system, water is considered the wetting phase.
The saturation of a phase (S α ) is the fraction of void space filled with that phase in the medium. When modeling two-phase flow, the permeability of each phase, α, will be affected by the presence of the other phase. Therefore, an effective permeability, K α , has to be used instead of the absolute permeability K, and relative permeability values are taken into account.
The previously-mentioned equations can be separated into a pressure equation and a saturation or transport equation via the fractional flow formulation [27], approach used in this work. For an immiscible, incompressible flow, the pressure equation becomes elliptic and the transport equation becomes hyperbolic. With this formulation, the pressure and transport equations are solved in separate steps in a sequential procedure, for more details see [14]. This approach is used throughout this work, therefore, we present a brief description of the method in Appendix B.
which, together with the previously computed velocity v w , transforms the transport Eq. (20), idem for the rest of the referenced equations in Appendix A to: φ ∂S w ∂t +∇·[f w (v w +λ nw ΔρgΔd)]+∇·(f w λ nw ∇p c ) = q w , where Δρ = ρ w − ρ nw . With this approach, the system is expressed in terms of the non wetting phase pressure, (27), and the saturation of the wetting phase, (31). In the pressure equation, the coupling to saturation is present via the phase mobilities, and the derivative of the capillary function. For the saturation, we have an indirect coupling with the pressure through the total Darcy velocity, (25). With this scheme, the equations are solved for the pressure using the previously computed saturation, and the saturation is updated by substituting the computed pressure.

Appendix C: Discretization methods
In this work, we use the sequential scheme to simulate twophase flow. With this approach, an unknown is fixed, e.g., the saturation of the wetting phase (S w ), and the resulting elliptic equation is solved for the pressure of the non-wetting phase (p nw ). Once p nw is computed, we update the total velocity, v, and we solve the parabolic transport equation for S w (more details in [14]).
The resulting system depends on space and time. The space derivatives are discretized using finite differences scheme; for the temporal discretization, we use the backward Euler method, details can be found in [14]. In the examples presented in Section 3, the discretization is performed with the Matlab Reservoir Simulation Toolbox, MRST [30].

Appendix D: Complexity
This appendix is devoted to the computation of the number of operations necessary to implement the methods studied throughout this work. For the implementation of the PODbased deflation method, first, we need to compute the snapshots with the ICCG method. Once the snapshots are computed, we obtain the eigenvalues ( ) and eigenvectors (V) of the covariance matrix R ∈ R n×n by computing the SVD of R T = XX T ∈ R p×p . The eigenvalues of both matrices are the same and the eigenvectors of R can be computed from: where V are the eigenvectors of R T (see [17,40]). The eigenvectors corresponding to the largest eigenvalues are selected as the POD basis. The operational cost of the ICCG method, the DICCG method, and the SVD process is presented in Table 10 (see [17]). For the computation of the number of flops of the DICCG method, we assume that the matrix Z is already given, i.e. it does not change during the iteration process. The flops are computed for sparse matrices with m the number of non-zero diagonals, and p the number of deflation vectors.

Conflict of Interests
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.