Data assimilation with soft constraints (DASC) through a generalized iterative ensemble smoother

This work investigates an ensemble-based workflow to simultaneously handle generic, nonlinear equality and inequality constraints in reservoir data assimilation problems. The proposed workflow is built upon a recently proposed umbrella algorithm, called the generalized iterative ensemble smoother (GIES), and inherits the benefits of ensemble-based data assimilation algorithms in geoscience applications. Unlike the traditional ensemble assimilation algorithms, the proposed workflow admits cost functions beyond the form of nonlinear-least-squares, and has the potential to develop an infinite number of constrained assimilation algorithms. In the proposed workflow, we treat data assimilation with constraints as a constrained optimization problem. Instead of relying on a general-purpose numerical optimization algorithm to solve the constrained optimization problem, we derive an (approximate) closed form to iteratively update model variables, but without the need to explicitly linearize the constraint systems. The established model update formula bears similarities to that of an iterative ensemble smoother (IES). Therefore, in terms of theoretical analysis, it becomes relatively easy to transit from an ordinary IES to the proposed constrained assimilation algorithms, and in terms of practical implementation, it is also relatively straightforward to implement the proposed workflow for users who are familiar with the IES, or other conventional ensemble data assimilation algorithms like the ensemble Kalman filter (EnKF). Apart from the aforementioned features, we also develop efficient methods to handle two noticed issues that would be of practical importance for ensemble-based constrained assimilation algorithms. These issues include localization in the presence of constraints, and the (possible) high dimensionality induced by the constraint systems. We use one 2D and one 3D case studies to demonstrate the performance of the proposed workflow. In particular, the 3D example contains experiment settings close to those of real field case studies. In both case studies, the proposed workflow achieves better data assimilation performance in comparison to the choice of using an original IES algorithm. As such, the proposed workflow has the potential to further improve the efficacy of ensemble-based data assimilation in practical reservoir data assimilation problems.


Introduction
Data assimilation aims to estimate the quantities of interest (QoI), e.g., model variables (states and/or parameters), based on certain sources of information. In a conventional setting, the sources of information largely come from Xiaodong Luo xluo@norceresearch.no 1 Norwegian Research Centre (NORCE), 5008 Bergen, Norway 2 University of Stavanger, 4021 Stavanger, Norway observed data, which are connected to the QoI through some underlying relations, e.g., in the forms of observation operators or forward simulators (with possible observation errors). In real-world problems, additional sources of information may also be available, e.g., in terms of the possible ranges of model variables, or certain physical laws (e.g., mass conservation) that govern the dynamical systems constituted by the model variables. These additional sources of information constitute practical constraints, and data assimilation incorporating both observed data and available constraints is often referred to as constrained data assimilation [19] within the community of data assimilation.
A constrained data assimilation problem can be tackled from different perspectives, which leads to a variety of constrained assimilation algorithms [2,34]. For instance, certain constrained assimilation algorithms are designed to handle a given type of constraints, e.g., equality constraints [19,35] or inequality constraints [6,32,33,40], or a mixture of both equality and inequality constraints [1,14,18].
There are also different ways of incorporating the constraints. For instance, when dealing with equality constraints, one may treat the equality-constraint system as a perfect observation operator without incurring any observation errors. These constraints are referred to as "perfect measurements" in [34]. Following this standpoint, one can combine the equality-constraint system and the original observation system into an augmented observation system. In this case, solving the constrained data assimilation problem boils down to simultaneously assimilate the augmented observations. Alternatively, one can adopt a two-step procedure, in which the first step is to estimate the QoI only based on the observed data, while the second step further modifies the estimated QoI to better honor the constraints, e.g., by projecting the estimated QoI obtained at the first step onto the feasible regions induced by the constraints [13,14], or by modifying the Kalman-gain-type matrix [14], or by iteratively refining the list (active set) of model variables that violate the constraints [32]. Under this setting, it is equivalent to sequentially assimilating different sources of information into the QoI. In addition, within the framework of Bayesian data assimilation, where the QoI are probability density functions (PDFs) of model variables, one more option to account for the constraints is to impose certain restrictions on the shapes of the prior PDFs, e.g., in terms of truncated PDFs when dealing with inequality constraints [20].
When dealing with linear constraints, one can recast constrained data assimilation as a quadratic programming problem [18], which can be solved by some general-purpose numerical optimization algorithms [30]. In addition, constrained assimilation algorithms developed from different perspectives to handle linear constraints may become equivalent under suitable conditions [34]. However, when nonlinearity is present in the constraints, more sophisticated strategies are needed, similar to the situation of handling nonlinearity in unconstrained data assimilation problems. Such strategies include, e.g., the first or second order linearization of a nonlinear-constraint system [13,15,35,42,44], unscented transform [39,41], particle approximation [43], moving horizon estimation [3], iterative approximation [9], and their combinations.
The focus of the current work is on constrained data assimilation in reservoir characterization problems. More specifically, we consider data assimilation with soft constraints (DASC), in the sense that the constraints will be incorporated into data assimilation, but an updated reservoir model resulting from a constrained assimilation algorithm may not strictly satisfy the constraints in general [34]. The reason for considering this relaxed setting is a few fold. First of all, although in certain circumstances it might be desirable to require that the constraints be strictly honored, such a requirement is typically satisfied in assimilation problems with linear constraints, where a twostep procedure is often adopted, with the last step designed to honor the constraints.
There are two issues related to this type of two-step procedures: (1) When a correction method is applied to enforce the constraints, the optimality of the updated models obtained at the first (unconstrained) step may be lost, since there could be a conflict between the optimality criterion used at the first step and the objective of honoring the constraints at the second. For instance, if the optimality criterion is to reduce data mismatch, then by applying the second correction step, it is likely that the further modified models may have higher data mismatch than those obtained from the first step, similar to the situation in [22]; (2) When nonlinearity is present, it becomes more challenging to strictly honor the constraints. As aforementioned, a certain approximation method is often adopted to deal with the nonlinearity in the constraint system. As such, approximation errors will typically arise, making updated model variables difficult to strictly satisfy the original constraints.
Furthermore, as discussed in [34], in some cases it may be sensible to only consider soft constraints, e.g., when one does not have the precise knowledge to accurately describe the constraint system. As an example, one may consider the box (or interval) constraints imposed on, e.g., porosity of a reservoir model. A box constraint like [0, 1] would be certainly correct, but may not be sharp in general. As such, it would be reasonable to adopt a narrower constraint interval, e.g., [10 −3 , 0.4], based on some prior knowledge (e.g., well log data). The selection of the lower and upper boundaries, however, is often heuristic, thus a slight violation of the box constraint may not cause a problem in practice.
An additional reason for us to consider soft constraints is its algorithmic convenience, when combining the ideas of "perfect measurements" and "multi-objective optimization", to derive constrained data assimilation algorithms. With the idea of "perfect measurements", we augment the constraint and the original observation systems, and assimilate all sources of information simultaneously. In the course of data assimilation, we aim to find model variables to match all sources of "data" as accurately as possible. Nevertheless, as aforementioned, the objective of matching one particular type of "data" to a good extent may contradict that of matching another type of "data" well. In this sense, DASC is similar to a multi-objective optimization problem. In the current work, however, we do not compute the Pareto front for the sake of computational efficiency.
Our contributions in this work include the following aspects: We develop an ensemble-based workflow to simultaneously handle generic, nonlinear equality and inequality constraints based on a recently proposed umbrella algorithm, called the generalized iterative ensemble smoother (GIES) [23]. This workflow inherits the benefits of ensemble-based data assimilation algorithms, in terms of the derivative-free nature and the ability to provide uncertainty quantification to some extent. Unlike the traditional ensemble assimilation algorithms, our proposed workflow admits cost functions beyond the conventional form of nonlinear-least-squares, and in principle allows the developments of an infinite number of constrained assimilation algorithms. We recast data assimilation with constraints as a constrained optimization problem. However, instead of relying on a general-purpose numerical optimization algorithm to solve the constrained optimization problem, we derive an (approximate) closed form for model update, but without the need to explicitly linearize the constraint systems, in contrast to some recently proposed ensemble-based algorithms (e.g., [13,15,44]) to handle nonlinear constraints. This closed form bears similarities to the model update formula of an ordinary iterative ensemble smoother (IES) [8,11,25] applied to unconstrained data assimilation problems. As such, it becomes relatively easy to transit from an ordinary IES to the proposed constrained assimilation algorithms, and the implementation of the proposed algorithms also becomes more straightforward for those who are familiar with ensemble based assimilation algorithms. Apart from the aforementioned features, we also develop efficient methods to handle two noticed issues that would be of practical importance for ensemble-based constrained assimilation algorithms. These issues include the implementation of localization in the presence of constraints, and the (possible) high dimensionality induced by the constraint system.
In other words, our proposed DASC workflow (through GIES) contains the following four features: (1) The ability to simultaneously handle generic, nonlinear equality and inequality constraints; (2) No need for explicitly evaluating the gradients of the constraint systems with respect to the QoI; (3) The ability to handle high-dimensional constraint systems; (4) A tailored adaptive localization scheme to tackle the adverse effects of a small ensemble size, while taking into account the presence of constraint systems. These features can be considered as desirable practical benefits that help promote the applicability of the proposed DASC workflow to generic constrained data assimilation problems. To the best of our knowledge, in the literature, there do not seem to exist ensemble-based, constrained data assimilation algorithms which simultaneously possess the above-mentioned features.
We note that, it is not our intention to show the proposed DASC workflow outperforms all existing, ensemble-based constrained data assimilation algorithms. Indeed, when the gradients of the constraint systems are available, then one should expect that a constrained assimilation algorithm exploiting the knowledge of these gradients perform better than the DASC workflow, if the latter only uses less accurate, ensemble-based approximations of these gradients. 1 Instead, the main strength of the proposed DASC workflow is its wider applicability to generic constrained data assimilation problems, to some of which the existing, ensemble-based constrained data assimilation algorithms may not be applicable, due to the potential challenges in these problems, in terms of, e.g., the type(s) of the constraints, the availability of the gradients of the constraint systems, or the dimensionality of the constraint systems, etc. As an example, the first case study to be presented later considers equality constraints that involve the histograms of reservoir models. In that case, it appears challenging to evaluate the gradients of a histogram with respect to reservoir model variables. As such, most (if not all) of the existing, ensemble-based constrained data assimilation algorithms cannot be directly applied to handle such equality constraints, yet the DASC workflow works well to deal with this problem, as will be shown later.
This work is organized as follows. We start from introducing the basics of the GIES, and then proceed to explain how the GIES algorithm can be employed to handle DASC problems. Next, we illustrate the performance of the proposed constrained assimilation algorithm through one 2D and one 3D case studies. Of particular interest here is the 3D case study, which contains an inequality-constraint system whose dimension is twice that of the reservoir model (in the order of 10 5 ). We show that with the efficient methods to handle the issues of localization in the presence of constraints and high-dimensional data induced by the constraints, the proposed constrained assimilation algorithm works well in this particular case study. This observation also implies that the proposed algorithm has the potential to handle real-field reservoir characterization problems with practical constraints. We conclude the whole work with some technical discussions and future research directions.

Generalized iterative ensemble smoother (GIES)
Suppose that we are given a noisy observation system Here, g : D m → D d represents a numerical forward simulator that maps an m-dimensional vector (called model vector hereafter) m tr ∈ D m ⊆ R m to a d-dimensional element in the observation space D d ⊆ R d . The term stands for potential contamination noise that may be present in the course of acquiring the observations d o . Without loss of generality, we assume throughout this work that the simulator g is perfect.
Under the above setting, data assimilation (often as an ill-posed inverse problem) aims to find, through a certain inversion algorithm, model variables that can match the observations d o reasonably well, which is a notion close to the idea of ensemble Kalman inversion for inverse problems [16]. Prior information available for use in the inversion algorithm includes the numerical simulator g, some initial guess of m tr , and (often assumed) statistical distribution of the noise term . In practical data assimilation problems, constraints may also be available as additional sources of prior information. Throughout this work, we assume that follows a Gaussian distribution N(0, C d ), with zero mean and covariance C d .
In reservoir engineering, a class of ensemble-based algorithms, called iterative ensemble smoothers (IES), e.g., [8,11,12,25], are often adopted to tackle reservoir data assimilation (also known as history matching) problems. In their practical implementations, most of the IES algorithms can be related to a certain cost function in the form of nonlinear-least-squares (NLS). For instance, [25] derived an IES algorithm by finding, at each iteration step, an ensemble of N e reservoir models that approximately minimizes the average of an ensemble of NLS-type cost functions. Concretely, suppose that at the ith iteration step, there already exists an ensemble of model variables where j is the index of ensemble member; and N e denotes the ensemble size. Then the IES algorithm in [25] aims to find a new ensemble of model variables M i+1 ≡ {m i+1 j } N e j =1 at the (i + 1)th iteration step, by solving the following minimum-average-cost (MAC) problem: The NLS-type cost function L j (m i+1 is the sample covariance matrix induced by the ensemble M i at the ith iteration step, where S i I is a square-root matrix defined in Eq. 7 later; and γ i is a positive coefficient that changes over the iteration step, following certain preset rules [25]. Note that in the course of formulating the MAC problem, we follow the analysis scheme in [4] and choose to perturb the observations. One may also adopt a different setting similar to that in a certain ensemble square root filter and avoids perturbing the observations, which, however, is not a topic to be investigated in the current work. The cost function L j (m i+1 j |d j , M i ) consists of two parts: represents the data mismatch term that computes a weighted euclidean ( 2 ) distance between the j th perturbed data d j and the corresponding simulated data g m i+1 j ; and j is a regularization term, which calculates a weighted euclidean ( 2 ) distance between a model vector m i j at the ith iteration step and an updated one m i+1 j at the next iteration step. Under this choice, the effect of this regularization term is to penalize a large deviation of m i+1 j from m i j under a weighted 2 norm. The MAC problem in Eq. 2 can be approximately solved through the following IES model update formula [23,25]: , j = 1, 2, · · · , N e ; (4) where I N e is a N e × N e identity matrix. Note that Eq. 5 contains two equivalent ways of expressing the Kalmangain matrix K i . The first expression is the one often used in the literature, whereas the second expression can be derived by applying a certain matrix identity to the first one [23].
The reason for us to include the second expression in Eq. 5 is that later on, it will become easier for us to see the similarities and differences between the original IES and GIES algorithms.
To facilitate the discussion later, we define a generic square-root matrix where O represents a certain operator that maps a model vector into another domain. The matrix S i O is a unified representation of the square-root matrices in Eqs. 7 and 8. Indeed, for S i I in Eq. 7, O is reduced to the identity operator I, such that I (m) = m for a model vector m. Similarly, for S i g in Eq. 8, the operator O equals the forward reservoir simulator g.
The formulation of the MAC problem in Eqs. 2-3 does not take into account constraints of model variables. As a result, the corresponding model update formulae, Eqs. 4-8, will not be able to deal with the constraints. This deficiency exists in most of the IES algorithms in the literature. As the focus of this work, we will explain later how the constraints can be naturally handled by our proposed umbrella algorithm, the generalized iterative ensemble smoother (GIES), after introducing its basics.
In a recent work, [23] introduces a class of GIES algorithms. The motivation of [23] is to extend the original IES algorithm in [25] in such a way that the underlying cost function L j (m i+1 j |d j , M i ) goes beyond the form of NLS in general. To this end, [23] proposes to derive GIES algorithms by solving the following generalized minimumaverage-cost (GMAC) problem: In Eq. 11, the cost function L j (m i+1 j |d j , M i ) also consists of two terms, namely, a data mismatch term with respect to the distance metric D : D t ⊆ R t → [0, +∞), and a regularization term with respect to the regularization operator R : D s ⊆ R s → [0, +∞). Within the data mismatch term, a transform operator T : D d → D t is introduced. The presence of T aims to accommodate the possibility that the observed and simulated data may be first transformed into another domain (e.g., [26,36]) before calculating their discrepancies. In a similar way, within the regularization term, a transform operator S : D m → D s is adopted to take into account the possibility that the regularization operator may be applied to transformed model variables (e.g., [5,17,21]). In addition, as will be illustrated in the case studies later, the choice of the regularization parameter γ i j in Eq. 11 may depend on both the iteration step and the ensemble member. This is different from the situation in Eq. 3, where the regularization parameter γ i only depends on the iteration step.
It is shown in [23] that the GMAC problem defined by Eqs. 10 and 11 is approximately solved by the following model update formula (referred to as the GIES algorithm): where, given an operator O and a dummy vector variable x (and a concrete value of x at x 0 ), and T • g represents the composition of the transform operator T and the forward simulator g, such that T • g (m) ≡ T (g (m)) for a given model vector m. Meanwhile, the definitions of S i T •g and S i S follow the custom in Eq. 9. Note that, when applying Eq. 16 to evaluate the first-and second-order derivatives in Eqs. 12-15, the operator O corresponds to the distance metric D or the regularization operator R, and x to an intermediate variable in the transformed data or model space. As such, when evaluating the derivatives like there is no need for further evaluating the first-or second-order gradient of T • g or S with respect to model variables (i.e., here one should not apply the chain rule to compute the total derivatives). Normally, since one has the freedom to choose the distance metric D and the regularization operator R, here we assume that D and R are chosen in such a way that the analytic forms of the derivatives If this is not the case, then one can also follow [23] to find ensemble-based approximations to them.
As an example, in Eq. 11, if both the model and data transform operators, S and T , respectively, are identity operators, and the distance metric D and the regularization operator R, respectively, satisfy then the cost functions in Eqs. 3 and 11 coincide. In this case, with some linear algebra, one has Inserting Eqs. 19-21 into the model update formulae of GIES, Eqs. 12-15, one can see that the model update formulae are reduced to those of the original IES, Eqs. 4-8 [23].
On the other hand, L j (m i+1 j |d j , M i ) defined in Eq. 11 has a much greater generality/flexibility than that in Eq. 3, and this forms the basis for us to derive data assimilation algorithms for DASC problems, as will be shown later. As an example, [23] considered a sub-class of cost functions in the form of In Eq. 22, the data mismatch term, , is the same as that in the original IES of [25] (cf. Eq. 3 of this work). However, as Eq. 23 indicates, the regularization term of Eq. 22 is presented as a mixture of where w k , B k and S k are mixture coefficient, weight matrix and (model) transform operator, respectively, associated with the kth sub-term; p k and q k are some positive real numbers, with the meaning of x q k p k being to raise the p k norm of a dummy vector x to its q k th power. It can be shown that the regularization term, in the cost function of the original IES (also cf. Eq. 3) corresponds to the choice K mix = 1 and p 1 = q 1 = 2 [23]. Meanwhile, in general p k and q k take positive values, and they are neither required to be any integer numbers, nor be equal to each other. As such, there could be an infinite number of possible combinations of the pair (p k , q k ), and this fact already means that in principle one can derive from the cost function in Eqs. 22 and 23 an infinite number of GIES algorithms, called q p -GIES hereafter, whose underlying cost functions would go beyond the form of NLS in general. For demonstration, [23] investigated a group (up to 30+) of q p -GIES algorithms, and showed that many of them outperformed the original IES of [25] in two numerical examples. For brevity, we do not provide further information of the q p -GIES algorithms here, and refer readers to [23] for more details.

A GIES-Based Approach to DASC Problems
Here we consider a data assimilation problem with certain constraints, in which the observation system is the same as that in Eq. 1, whereas the constraints may consist of both equality and inequality ones. Without loss of generality, for a given model vector m, let us suppose that the equality and inequality constraints, respectively, are described by the following systems: where both f eq ("eq" for equality) and h in ("in" for inequality) could be some nonlinear vector mappings in general. For brevity, we also refer to equality and inequality constraints as EQ-constraints and IN-constraints, respectively. As shown in the previous subsection, both the original IES and GIES algorithms can be derived by solving certain stochastic optimization problems, but without yet taking into account possible constraints on model variables. Following this optimization-theoretic perspective, including the constraints into data assimilation can then be recast as some constrained optimization problem. Bearing this notion in mind, we propose to derive a class of constrained GIES algorithms by solving the following GMAC problem: Equations 27 and 28 indicate that the cost function underlying a constrained GIES algorithm contains an "effective" data mismatch term and a regularization term. In contrast to the data mismatch term in the unconstrained data assimilation problem (cf Eq. 11), however, now the "effective" data mismatch term needs to incorporate the impacts of constraints on the assimilation algorithm. Therefore, in Eq. 28, the "effective" data mismatch term is in the form of a mixture of three sub-terms, corresponding to the observation and constraint systems, Eqs. 1, 24 and 25, respectively. The notations in Eq. 28 are similar to those in Eq. 11, but with some additional subscripts like "d" ("d" for data), "eq" and "in" present for distinction of different sources of information. For instance, D index and T index , index ∈ {d, eq, in}, stand for the distance metric and the transform operator of respective (observation or constraint) system. Without loss of generality, we assign a mixture coef- whereas α i j and β i j are mixture coefficients of respective constraint systems. Since we are dealing with soft constraints, we do not treat α i j and β i j as Lagrange multipliers. Therefore, we do not adopt the Karush-Kuhn-Tucker (KKT) conditions [30] to choose the values of α i j and β i j . Instead, α i j and β i j are treated as scalars that specify the relative weights of individual data mismatch terms, and their values will be determined in a way adaptive to the iteration step and ensemble member, using a criterion similar to that in [23], which will be illustrated in numerical case studies later.
Similar to the way to derive the q p -GIES algorithms, the model update formulae, Eqs. 12-15, can be applied to derive constrained assimilation algorithms, with some changes to account for the mixture of data mismatch terms. Specifically, using the analysis in Appendix D of [23], for the model update formulae in Eqs. 12-15, the following parts will be reduced to: Inserting Eqs. 29-31 into Eqs. 12-15, we obtain the following model update formula for DASC problems: For reference later, we call algorithms derived from Eq. 32 constrained GIES (C-GIES). Clearly, when no constraint is considered (e.g., by setting α i j = β i j = 0), the C-GIES will reduce to the GIES algorithms in Eqs. 12-15. On the other hand, the impacts of EQ and/or IN-constraints will be promoted by increasing the values of α i j and/or β i j . In practical data assimilation problems, the distance metrics D d , D eq and D in , regularization operator R, and transform operators T d , T eq , T in and S in Eq. 32 are chosen by the users. This implies that in principle there could also be infinitely many C-GIES algorithms, similar to what we have previously mentioned for the q p -GIES algorithms. In the current work, however, our main objective is not to show the richness of the C-GIES algorithms. Instead, we focus more on explaining how the constraints can be incorporated into data assimilation through a GIES-based framework, and demonstrating the benefits of including such constraints. Consequently, in the sequel, we will only consider a particular C-GIES algorithm which is close to the original IES algorithm in [25]. In fact, this C-GIES algorithm will recover the original IES algorithm if there is no constraint present (e.g., by setting α i j = β i j = 0). Therefore, under this setting, the impacts of the constraints will become manifest by comparing the performance of this C-GIES and the original IES algorithms.
Specifically, in the chosen C-GIES algorithm, we let the distance metric D d and the regularization operator R be the same as those in Eqs. 17 and 18, respectively. The choices of the distance metrics D eq and D in are worthy of more attention, and will thus be discussed separately later. Meanwhile, we let T d , T eq , T in and S all be identity operators. With these choices, Eq. 32 is reduced to Note that in Eq. 33, We make the value 0 explicitly present in the update formula to indicate that 0 is treated as the perfect measurements in the constraint systems.
Regarding the choice of the distance metric D in for INconstraints, essentially we utilize the concept of barrier function [6,30]. Suppose that m i j already satisfies the IN-constraints, then the idea of barrier function is to choose a continuous mapping D in in such a way that moves away from the boundary, as a way to honor the IN-constraints. In the current work, we choose the following form for D in : for a given dummy vector x ≥ 0 (e.g., In Eq. 34, a > 0 is a constant vector in the same size as x; log (•) is a vector function which applies the natural logarithmic function to each element of •; and 1 len(x) is a vector whose elements are all equal to 1, and whose length is the same as that of the vector x. As desired, when any of the elements of x + a approaches 0, then D in (x) tends to +∞; On the other hand, if all elements of x+a are relatively far away from 0, then the value of D in (x) will decrease. The constant vector a is introduced here for numerical stability, to prevent possible divisions by zero in Eqs. 35 In Eqs. 35 and 36, ./ and∧2 stand for element-wise division and square operators applied to vectors, such that 1 len(x) ./(x + a) = [1/(x 1 + a 1 ), 1/(x 2 + a 2 ), · · · ] T and where x k , a k (k = 1, 2, · · · ) are elements of x and a, respectively. The diag operator converts a vector to a diagonal matrix, in such a way that the diagonal elements of the diagonal matrix are filled in by the elements of that vector. For the choice of the distance metric D eq , we adopt a notion opposite to barrier function. Instead of propelling the constraint system (h in m i+1 j ) away from the boundary 0, we want D eq to attract the constraint system (f eq m i+1 j ) towards 0. For this reason, we call D eq a "channel function". In this work, D eq is chosen as follows: for a given dummy vector x (e.g., x = f eq m i j ), where b is also a constant vector close to zero; |x| and log (x) are element-wise operators that take the absolute value, and natural logarithm of a vector x, respectively. Roughly speaking, D eq (x) will become relatively large if the element magnitudes of x are big, and tend to decrease when the element magnitudes of x are reduced. Similar to Eqs. 35 and 36, it can be shown that, the gradient and Hessian of D eq with respect to x in Eq. 37 is given by where sgn(x) ≡ [sgn(x 1 ), sgn(x 2 ), · · · ] T returns the elements' signs of x, and× denotes element-wise (Hadamard) product, such that x×y = [x 1 × y 1 , x 2 × y 2 , · · · ] T , with x k and y k (k = 1, 2, · · · ) being the elements of x and y, respectively. In numerical calculations, we follow the custom to set sgn(0) = 0, which implies that in Eqs. 38 and 39, divisions by zero will occur at x = 0. To mitigate this potential problem, in our implementation, we approximate the gradient and Hessian in Eqs. 38 as follows: where > 0 is a (relatively small) positive scalar. Again, the values of b and will be specified in numerical case studies later.
For the distance metrics D in and D eq defined by Eqs. 34 and 37, respectively, one feature is that their resulting Hessian in Eqs. 36 and 39 (or (41)) are diagonal matrices. This feature would be particularly useful if the corresponding constraint systems are of very high dimensions (which is the situation in one of our numerical case studies later). To see this point, let us take the Hessian ∇ 2 D in [x] as an example. Being a diagonal matrix means that ∇ 2 D in [x] can be manipulated as a sparse matrix, which would consume considerably less computer memory than a non-diagonal, full Hessian matrix. In addition, the C-GIES update formula (33) involves certain the matrix products can be simplified, 2 which also implies reduced computational costs. On the other hand, it should be noted that in general, there could be many good choices for D in and D eq , while those defined by Eqs. 34 and 37 may not necessarily be the optimal ones. Investigating the optimal forms of D in and D eq is beyond the scope of the current work, and is thus not further considered hereafter. 2 For instance, for a vector x and a matrix A ≡ [a 1 , a 2 , · · · ], where a k (k = 1, 2, · · · ) are the column vectors of A, then it can be shown that diag (x) A = [x, x, · · · ]× A = x× a 1 , x× a 2 , · · · , where× stands for element-wise (Hadamard) product.

Localization in the C-GIES algorithm
As aforementioned, one of the core ideas used in this work to derive constrained assimilation algorithms is the notion of "perfect measurements" [34], where the constraint systems are treated as forward observation operators without any measurement errors. In the context of ensemble data assimilation, localization is an auxiliary technique often adopted to mitigate the problem of ensemble collapse and strengthen the performance of assimilation algorithms [7,10,28]. When applying localization to "perfect measurements" from the constraint systems, however, there are two issues worthy of special attention. One issue is that the "perfect measurements" might not possess any physical locations, such that (physical) distance-based localization methods [7,10] may not be applicable. In fact, as one may notice in the C-GIES update formula (32), or its special case (33), in general it is the gradients in the original IES algorithm algorithm (cf. Eqs. 4 and 5). 3 As such, for the umbrella GIES algorithm or its descendant C-GIES algorithm for DASC problems, in general it would be more appropriate to exploit the gradients of respective observation or constraint systems, rather than the locations of observations, for localization.
Another problem is that even though the number of real observations (e.g., production data) may not be big, the dimensions of the constraint systems could be very high.
As an example, one may consider box constraints imposed on all petro-physical parameters distributed on reservoir gridblocks, which is the experiment setting to be adopted in one of the case studies later. In large scale history matching problems, a straightforward application of the conventional localization technique would thus consume a huge amount of computer memory, since it involves a matrix (called tapering matrix hereafter) in the dimension of model size by effective data size, where by "effective data size" we mean the number of real measurements plus the dimensions of constraint systems. 3 In fact, it can be shown that C To simultaneously tackle the aforementioned two problems, namely, (1) localization based on information of gradients, rather than physical locations of real and "perfect" measurements; and (2) localization involving big models and big effective data induced by constraints, there are two possibly strategies. One strategy is to follow the recent work of [37], which extends correlation-based Kalmangain type localization [24,28] to correlation-based local analysis. In the context of C-GIES, one can compute the sample correlations between an ensemble of reservoir models and the corresponding ensemble of terms related to the gradients ( . Using the information of these correlations would eliminate the need for physical locations and render a few additional practical advantages, e.g., the ability to handle both non-local observations and the spatialtemporal characteristics of observations, see the elaborations in [24,28]. Meanwhile, local analysis can be employed to tackle the issue of big model and big datasets. Instead of performing a single global update, local analysis conducts a series of local updates, each of them only involving a small number of model variables and effective data that are grouped based on the computed sample correlations [37].
Another practical strategy is still based on correlationbased Kalman-gain localization. To deal with the presence of big datasets in localization, however, we do not directly handle the big datasets themselves. Instead, we follow the idea in [29] and project the big datasets onto certain ensemble subspaces. After projections, we apply correlation-based localization to terms related to the projected gradients, e.g., in Eq. 33. All these projected gradients are column vectors in the dimension of N e × 1, with N e being the ensemble size, hence the sizes of projected gradients are independent of the dimensions of the original gradients. One more issue to notice here is that in Eq. 33, the term in general depends on individual ensemble members m i j . To separate the effects of observed data and constraints on model updates, at each iteration step i, we choose to multiply all the projected gradients by this matrix factor F i j (in the dimension of N e × N e ), and then use the ensemble of the following products (all in the dimension of N e × 1) together with the ensemble of reservoir models m i j (for j = 1, 2, · · · , N e ) to compute three sets of correlation fields, denoted by C i d , C i eq and C i in (all in the dimension of m × N e ), respectively. Then following the previous work [24,28], we can construct three tapering matrices, denoted by T i d , T i eq and T i in (all in the dimension of m × N e ), based on C i d , C i eq and C i in . These tapering matrices are applied to the model update formula (33), such that the model update formula with correlation-based localization reads as follows: In the current work, we adopt the projection-based strategy to implement correlation-based Kalman-gain localization, as this requires minimal changes of the C-GIES update formula in Eqs. 32 or 33, in contrast to the strategy based on correlation-based local analysis. Throughout this work, we use the localization scheme, labeled as RndShfl-GC, in [24] to construct the tapering matrices T i d , T i eq and T i in . For brevity, however, we skip the details of RndShfl-GC, and refer readers to [24] for more information.

Experiment settings of the 2D case study
This section focuses on illustrating the performance of the chosen C-GIES algorithm, (33), in a 2D channelized reservoir characterization problem, which was investigated in the previous work [23]. The size of the 2D reservoir model is 45 × 45, in terms of the number of gridblocks. The unknown parameters of the reservoir model contain x-dimensional permeability (PERMX) distributed on all reservoir gridblocks, whereas other parameters (e.g., porosity) are assumed to be known. As such, the total number of parameters to be estimated in this 2D case study is 2025. Figure 1 indicates the reference PERMX map, where the positions of 8 injectors (I1 -I8) and 8 producers (P1 -P8) are marked. The reference model contains two facies, sand and shale, with their corresponding PERMX Fig. 1 Reference model used in the 2D synthetic case study, with well positions marked by white ellipses. Well names (in green) starting with the letter "I" stand for injection wells, while those with the letter "P" for production wells values being 10000 md and 500 md, respectively. The reference reservoir model is simulated using ECLIPSE © 100 black oil simulator, with a total simulation time of 3800 days. The production data generated during the first 1900 days (history-matching period), including well oil/water production rates (WOPR/WWPR) from 8 producers, and well bottom-hole pressures (WBHP) from 8 injectors, are used to history-match an ensemble of reservoir models, whereas the corresponding production data during the second 1900 days (forecast period) are reserved to crossvalidate the performance of different history matching algorithms. The total numbers of production data during history matching and forecast periods are both 240. To mimic the situation in practical history matching problems, we add (zero-mean) Gaussian white noise to all the production data during the history matching period. The standard deviations (STDs) of the noise in both WOPR and WWPR data are 0.2236 m 3 /day, while those of the noise in WBHP data are 0.2646 bar. On the other hand, during the forecast period, no noise is added to the production data.
An ensemble of 100 PERMX maps is generated by the SNESIM algorithm [38] to initialize ensemble-based history matching algorithms. Figure 2 shows the mean and STD maps of the initial ensemble.
In this case study, we consider one IN-and one EQconstraint systems. The IN-constraint system imposes a box constraint on the range of the PERMX value m k on the kth gridblock (k = 1, 2, · · · , 2025), such that 100 md ≤ m k ≤ 15000 md. This box constraint then leads to the following IN-constraint system 100 − m k ≤ 0; (47) for k = 1, 2, · · · , 2025. As a result, the dimension of the IN-constraint system is 2 × 2025 = 4050. The EQ-constraint system aims to impose a constraint on the shape of the histogram of an estimated PERMX map. To this end, we assume that the ratio of sand and shale facies in the reference PERMX map (cf Fig. 1), hence the corresponding histogram, is (roughly) known (e.g., based on well log data). The first row of Fig. 6 shows the histogram of the reference PERMX map and that corresponding to the mean of the initial ensemble of PERMX maps. During history matching, we compare the histogram of each estimated PERMX map to that of the reference PERMX map bin by bin. For a quantitative comparison, in each bin, we count the number of permeability values falling into that bin, while the total number of bins is 50. For convenience of  [m 1 , m 2 , · · · , m 2025 ] T ) to a set of numbers, through the steps: PERMX map → histogram → number of permeability values in each bin. Following this setting, the EQ-constraint system is composed as follows: whereas the dimension of the EQ-constraint system equals the number of bins of a histogram (50 in this case). Here, we note that due to the possible conflict with other objectives (e.g., that of reducing data mismatch), these EQ-constraints may not be strictly satisfied during the history matching process. Nevertheless, as will be shown later, the presence of these EQ-constraints does tend to help improve the performance of history matching.
Here we compare the performance of a few IES algorithms. The base case for comparison is the original IES algorithm (cf Eqs. 4-8) combined with a simple truncation strategy to handle a box-constraint m lb ≤ m k ≤ m ub (k = 1, 2, · · · , 2025 in the current case study), where m lb (= 100) and m ub (= 15000) represent lower and upper bounds, respectively, for a model variable m k . With the truncation strategy, whenever an m k estimated by the original IES algorithm is less than m lb , then we set m k = m lb ; Likewise, whenever an estimate m k is larger than m ub , then we set m k = m ub . For reference later, the base-case algorithm is referred to as O-IES.
The other algorithms in comparison include the C-GIES algorithms (cf. Eq. 33) that take into account the INconstraints (47-48) and/or the EQ-constraints (49). For distinction later, we refer to these algorithms as C-GIES-IN, C-GIES-EQ and C-GIES-(IN+EQ), respectively. By "C-GIES-IN", we mean that the C-GIES algorithm only deals with the IN-constraint system. Similarly, the C-GIES-EQ algorithm only considers the EQ-constraint system, and the C-GIES-(IN+EQ) algorithm takes into account both constraint systems. Note that all these three types of C-GIES algorithms are also equipped with the truncation strategy, such that the differences of history-matching performance among these IES algorithms would reflect the impacts of the constraints incorporated through our proposed framework.
The C-GIES algorithms based on Eq. 33 require to determine the weight coefficients α i j , β i j and γ i j at each iteration step. In the current work, we follow the strategy in [25], and set α i j as follows: where w 1 is a positive constant not too far away from 1, and the trace operator calculates the trace of a matrix. The rationale behind Eq. 50 is that when the EQ-constraint system is present, we want its influence on model update, to be comparable to that of the data mismatch, in terms of The value of β i j is determined in a similar way, as follows: where w 2 is another positive constant not too far away from 1. The values of (w 1 , w 2 ) adopted for the four algorithms in comparison, namely, O-IES, C-GIES-IN, C-GIES-EQ and C-GIES-(IN+EQ), can be found in Table 1.
As for the regularization parameter γ i j , it is calculated as follows: Note that in Eq. 54 we have used the fact that trace I N e = N e , where w i 3 is a positive scalar that adaptively changes over the iteration step. Following [25], we let w 0 3 = 1, w i+1 3 = 0.9 × w i 3 if the average data mismatch over ensemble members (to be defined later) at the (i + 1)th iteration step is lower than that at the ith iteration step; otherwise, we set w i+1 During history matching, the maximum number of the iteration steps for all algorithms is 50, but the iteration process may stop earlier, if the change of the average data mismatch values during two consecutive iteration steps is less than 0.01%.
The use of the distance metrics D in and D eq also requires to specify constants a, b and , cf. Eqs. 34-41. In the experiments, we set a = b = 0.1 × 1 and = 0.001.
To cross-validate the performance of history matching algorithms, we adopt the following two measures, namely, forecast data mismatch and root mean square error (RMSE). Data mismatch uses a weighted 2 2 metric in the data space to measure the distance between the observed and the simulated data. Given an ensemble of reservoir models, a forward reservoir simulator g and Table 1 Performance of the four history-matching algorithms in the 2D case study. The performance is measured in terms of data mismatch (mean ± STD) during the forecast period, and RMSE (mean ± STD), which are computed using the ensembles of reservoir models at the final iteration steps History-matching data mismatch Forecast data mismatch RMSE of PERMX Values of (mean ± STD) ×10 3 (mean ± STD) ×10 3 (mean ± STD) ×10 3 (w 1 , w 2 ) For comparison, the quantities with respect to the initial ensemble are also included, and the relative changes of the average quantities between the initial and final ensembles are provided in separate parentheses. In addition, we note that the values of (w 1 , w 2 ) are not applicable (N/A) to the initial ensemble the observed field dataset d o , an ensemble Ξ(M i ) of data mismatch values is computed as follows: where C d is the observation error covariance matrix associated with the dataset d o (also cf. Eq. 3 and the corresponding text for the definition of C d ).
Similarly, RMSE adopts a (scaled) 2 metric in the model space to examine the distance between the reference and the estimated reservoir models. Given an m-dimensional reference model m ref and of reservoir models, an ensemble Ω(M i ) of RMSE is calculated as follows: Table 1 reports the data mismatch values (mean ± STD) during both history-matching and forecast periods, and RMSE values (mean ± STD), with respect to both the initial ensemble, and the final ensembles obtained by four historymatching algorithms: O-IES, C-GIES-IN, C-GIES-EQ and C-GIES-(IN+EQ). The values of the pair (w 1 , w 2 ) used by these history-matching algorithms are also listed therein. In comparison to the initial ensemble, all four algorithms are able to achieve better history-matching performance, in terms of both lower forecast data mismatch and RMSE values. In addition, compared with the base-case O-IES algorithm, the three C-GIES algorithms achieve enhanced performance, by exploiting either EQ-constraints or INconstraints, or both. As a matter of fact, in this particular case study, it is the C-GIES-(IN+EQ) algorithm with both types of constraints that achieves the best performance in terms of both forecast data mismatch and RMSE, manifesting the benefits of incorporating both types of constraints into history matching through the proposed workflow.  As in Table 1, the relative changes of the average values of barrier and channel functions between the initial and final ensembles are provided in separate parentheses  Similarly, Table 2 shows the values (mean ± STD) of barrier and channel functions (cf. Eqs. 34 and 37), with respect to the initial ensemble, and the final ensembles obtained by the four history-matching algorithms. Note that these function values are calculated only at the posthistory-matching stage, and are not utilized during history matching. Compared to the initial ensemble, one can see that the final ensembles obtained by the four historymatching algorithms end up with higher barrier and channel function values, which is different from the situation in Table 1, wherein the final ensembles achieve lower data mismatch values. This contrast suggests that reducing data mismatching and enforcing EQ-and/or In-constraints could be conflicting objectives. On the other hand, by comparing the base-case algorithm with the three C-GIES algorithms, the benefits of incorporating EQ-and/or In-constraints through the proposed workflow will be better appreciated. For instance, after incorporating the EQ-constraints, one can see that the channel function values of the C-GIES-EQ algorithm tend to be noticeably lower than those of the O-IES algorithm; Likewise, after incorporating the INconstraints, the barrier function values of the C-GIES-IN algorithm also tend to be lower than those of the O-IES algorithm. A more complicated situation happens in the C-GIES-(IN+EQ) algorithm, wherein the barrier function values, as expected, tend to be lower than those of the O-IES algorithm, but the channel function values, surprisingly, tend to be higher than those of the O-IES algorithm. A possible explanation to this situation is that in this particular case study, the objective of honoring the box-constraints may also conflict with the objective of changing the shape of a histogram.

Numerical results of the 2D case study
For further illustration, Fig. 3 presents the box plots of data mismatch values (left column) computed using the ensembles of reservoir models at different iteration steps of the O-IES, C-GIES-IN, C-GIES-EQ and C-GIES-(IN+EQ) algorithms; and the corresponding RMSE values (right column) at different iteration steps. As one can see from these plots, both data mismatch and RMSE values tend to decrease as the iteration process goes on. In consistency with the results in Table 1, it is the last ensemble obtained by the C-GIES-(IN+EQ) algorithm that tends to achieve the lowest RMSE values. Figure 4 shows the production-data profiles, with respect to WOPR in P2 and WWPR in P4. The forecast production data (blue curves) are computed using the final ensembles of the four history-matching algorithms. Overall, for WOPR in P2, the forecast production data generated by the four history-matching algorithms match the reference (orange curve) reasonably well. For WWPR in P4, however, higher data mismatch is spotted. Among the four history-matching algorithms, the C-GIES-(IN+EQ) algorithm has the best performance, as is reported in Table 1. In Fig. 5, we report the mean PERMX maps of the final ensembles obtained by the four history-matching algorithms, as well as the reference PERMX map and the mean map of the initial ensemble. All these four algorithms do not perform well in term of preserving the connectivities of the channel bodies, although all of them appear to be able to capture the main geological structures in the reference map. The history-matching performance is expected to be further improved by adopting a certain more sophisticated model re-parameterization strategy, e.g., through deep unsupervised learning [5]; or by incorporating into the cost function (e.g., in Eq. 11) a more suitable regularization term [23]. Either investigation, however, is beyond the scope of the current work.
Finally, Fig. 6 compares a set of histograms calculated from the reference PERMX map, the mean maps of the initial ensemble and the final ensembles obtained by the four history-matching algorithms. In line with the results in Table 2, the histogram of the C-GIES-(IN+EQ) algorithm appears to be more different from the others, e.g., apart from the two modes located at the correct values 500 and 10000 (as in the reference histogram), that histogram has two additional peaks with substantial heights whose locations are inconsistent with those in the reference histogram. In contrast, the histograms of the O-IES, C-GIES-IN, and C-GIES-EQ algorithms roughly have bi-modal distributions, and exhibit more similarities to those of the reference PERMX map.

Experiment settings of the 3D case study
In the second numerical example, we apply IES algorithms to the Brugge benchmark case [31], in which the reservoir model is 3D, in the dimension of 139 × 48 × 9. Therefore, there are 60048 gridblocks in total, and 44550 of these gridblocks are active. The parameters to be estimated include x-, y-and z-dimensional permeability values (denoted by PERMX, PERMY, PERMZ, respectively) in the scale of natural logarithm, and porosity values (denoted by PORO) distributed on these 44550 active gridblocks. Hence, the total number of unknown parameters is 4 × 44550 = 178200. In this particular case study, the histograms of the parameters do not appear to provide much information of the geological structures, and it does not seem to be realistic to assume that we know the histograms of the reference reservoir model. As such, here we do not consider histogram-based equality constraints as in the previous 2D example. Instead, we only consider a set of box constraints imposed on PERMX, PERMY, PERMZ and PORO values. Specifically, for PERMX values, the box constraints are that −4 ≤ ln(m permx ) ≤ 9.2, where ln is the natural logarithmic function, m permx represents the PERMX value on an active gridblock; Similarly, the box constraints for other types of petro-parameters are −4 ≤ ln(m permy ) ≤ 9.2; −7 ≤ ln(m permz ) ≤ 6.9 and 0.07 ≤ m poro ≤ 0.3. As a result, the IN-constraint system is almost the same as that in the 2D case system, except that the lower and upper bounds, and the dimensionality are different. In the current case study, the dimension of the IN-constraint system is 2 × 178200 = 356400, twice the reservoir-model size. To handle the high dimensionality of the IN-constraint system, the diagonal form of the Hessian matrix in Eq. 36 proves to be particularly useful for reducing the consumption of computer memory, otherwise it becomes prohibitively expensive to directly manipulate the Hessian matrix in the dimension of 356400 × 356400 (which requires around 900 GB memory in MATLAB © ).
The initial ensemble of reservoir models is taken from the Brugge benchmark dataset, which contains 104 ensemble members. One of the members is chosen as the reference model to generate observed production data at 20 producers (labeled as BR-P-1, BR-P-2, ..., BR-P-20, or simply as P1, P2, ..., P20) and 10 injectors (labeled as BR-I-1, BR-I-2, ..., BR-I-10, or simply as I1, I2, ..., I10). The production period is 10 years, and for history matching we use the following types of production data reported at 20 time steps: well oil production rates (WOPR) and water cuts (WWCT) from the 20 producers, and the bottom hole pressures (WBHP) from all 30 wells. As a result, the total number of production data is 1400. The production data are contaminated by a certain amount of zero-mean Gaussian white noise. For WORP and WWCT data, the STDs of the noise components are 10% of the magnitudes of the pure production data generated by the reference model. In case that the magnitude of a data point is zero, then the STD is set to be 10 −6 instead, to avoid the possible issue of devision by zero when calculating data mismatch values. On the other hand, for WBHP data, the STDs of the noise components are all set to 1 bar. In this case study, we use all available data for history matching, so there is no forecast period. As such, we adopt RMSE for performance cross-validation.
In the sequel, we compare the performance of two history-matching algorithms. One algorithm is the original IES, but combined with the truncation strategy to honor the box constraints. This algorithm is again referred to as O-IES, as in the 2D case study. Another algorithm is a C-GIES with the IN-constraint system induced by the box constraints. Following the custom in the 2D case study, this algorithm is referred to as C-GIES-IN, and we note that here the C-GIES-IN is also equipped with the truncation strategy.
Since there is no EQ-constraint system involved, the weight coefficient w 1 (cf. Eq. 50) is set to 0; the weight coefficient w 2 (cf. Eq. 52) associated with the IN-constraint system is set to 5; and the weight coefficient w i 3 (cf. Eq. 54) will adapt to the iteration steps, following the same rule as in the 2D case study. During history matching, the maximum number of the iteration steps for O-IES and C-GIES-IN Table 3 Performance of the two history-matching algorithms in the Brugge case study. The performance is measured in terms of RMSE (mean ± STD), which are computed using the ensembles of reservoir models at the first and final iteration steps Other quantities reported here include data mismatch during history matching, and the value of barrier function (in the form of mean ± STD), with respect to both the initial and final ensembles (and as in the tables of the previous case study, we also include the relative changes of these quantities in parentheses). For RMSE, the values are calculated with respect to PERMX, PERMY, PERMZ (in the scale of natural logarithm), PORO, and the combination of all these variables, respectively  algorithms is 10, but these algorithms may stop earlier if the change of the average data mismatch values during two consecutive iteration steps is less than 0.01%. Table 3 examines the values of data mismatch, barrier function of the IN-constraint system, and RMSE (separate for PERMX, PERMY, PERMZ, PORO, and also the combination of all model variables), with respect to both the initial ensemble, and the final ensembles obtained by the O-IES and C-GIES-IN algorithms. All these quantities are reported in the form of mean ± STD. In comparison to the values of data mismatch, barrier function and RMSE of the initial ensemble, those of the final ensembles tend to be lower, indicating that the performance is improved by using either history-matching algorithm, and that in this particular case study, it is possible to simultaneously reduce both data mismatch and barrier function values during history matching. In comparison to the O-IES algorithm, Fig. 10 As in Fig. 9, but for PORO maps on Layer 2 the C-GIES-IN algorithm tends to result in lower data mismatch and barrier function values (meaning that the box constraints are better honored). In terms of RMSE, the C-GIES-IN algorithm leads to lower (average) estimation errors for PERMX, PERMY and PERMZ, but slightly higher errors for the estimation of PORO. The overall RMSE (by putting all model variables together) induced by the C-GIES-IN algorithm also tends to be smaller than that of the O-IES algorithm.

Numerical results of the 3D case study
Next, Fig. 7 shows the box plots of data mismatch and RMSE values of PERMX, PERMY, PERMZ and PORO at different iteration steps of the O-IES and C-GIES-IN algorithms. In both history-matching algorithms, all the examined quantities tend to decrease as the iteration step increases. As can be expected from the results in Table 3, in comparison to the O-IES algorithm, the C-GIES-IN algorithm leads to lower data mismatch and RMSEs of the estimated permeability values, but results in slightly higher errors in the estimated porosity values.
Moreover, Fig. 8 reports the profiles of some production data, including WBHP in P2, WOPR in P11, and WWCT in P12. As can be seen there, in general, the forecast production data of the C-GIES-IN algorithm (right column) tend to match the observed data better than the O-IES algorithm (left column) does.
For an inspection on the reservoir models, Figs. 9 and 10 show the reference and mean PERMZ and PORO maps, respectively, on Layer 2. In comparison to the mean PERMZ map of the initial ensemble, more substantial differences can be found in the mean PERMZ map of the final ensemble, obtained by either the O-IES or the C-GIES-IN algorithm. In contrast, the differences among the mean PORO maps of the initial and final ensembles appear less noticeable. The possible reason behind this observation is that in this particular case study, the changes of permeability values are more sensitive to production data than the changes of porosity values, as also noticed in the previous work [26]. On the other hand, by comparing the final mean PERMZ and PORO maps to those of the reference model, one can see that the resemblances of the final estimations to the reference maps may not be satisfactory. One can improve the estimation quality by introducing additional sources of information, e.g., 4D seismic data, into history matching (see, for example, [27,28]), which, however, is a topic beyond the focus of the current work.

Discussion and conclusion
This work presents an ensemble-based workflow to handle data assimilation with soft constraints (DASC). The main idea here is to treat both equality-and inequality-constraint systems as some perfect observation operators, and then augment these constraint and the original observation systems. Under this setting, DASC aims to simultaneously assimilate multiple sources of information into model variables. From the optimization-theoretic perspective, DASC can be recast as a generalized minimum-averagecost (GMAC) problem, in which the cost function contains a data mismatch term that consists of three discrepancy sub-terms, induced by the original observation system, the equality-constraint system, and the inequality-constraint system, respectively. The generic GMAC problem can be approximately solved by the generalized iterative ensemble smoother (GIES), and in the context of constrained data assimilation, it leads to a constrained GIES (C-GIES) algorithm, where we choose the square of a weighted 2 norm in the regularization term of the cost function. Although not investigated in the current work, in principle one can develop infinitely many C-GIES algorithms by adopting a different regularization term, following the example of the class of q p -GIES algorithms in [23]; or in a similar way, by changing the distance metrics D d , D eq and D in or the transform operators T d , T eq and T in in the data mismatch term (cf. Eq. 28).
The presence of constraints leads to two potential issues in practice. One is regarding the possible high dimensionality induced by the constraints, and the other is about localization for the constraint systems. To tackle the first issue, we propose to choose the distance metrics D eq and D in in such a way that the resulting Hessians are diagonal matrices. To deal with the second issue, we apply correlation-based adaptive localization to the terms related to the projected gradients, which leads to some tapering matrices in the dimension of model size by ensemble size, and is useful for handling a large amount of "effective" data, from either the original observation system, or the constraint systems, or both.
We then apply the proposed C-GIES algorithm to one 2D and one 3D case studies. The numerical results indicate that properly incorporating the constraints does help improve the performance of data assimilation. In particular, the experiment settings of the 3D example are close to those of a real field case study, and we have observed that the proposed C-GIES algorithm is able to efficiently handle both the issues of high dimensionality induced by the constraint systems, and localization for the constraint systems, while achieving reasonably good performance. This implies that the proposed C-GIES algorithm has the potential to be used in real field case studies.
Finally, we bring up two remaining open problems in this line of research. One problem is how to optimally allocate relative weights to the discrepancy sub-terms with respect to the original observation system and the constraint systems, respectively; and the other is that, at a higher level, how to choose an optimal cost function to deduce a suitable GIES or C-GIES algorithm for a given data assimilation problem.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.