Combining CSEM or gravity inversion with seismic AVO inversion, with application to monitoring of large-scale CO2 injection

A sequential inversion methodology for combining geophysical data types of different resolutions is developed and applied to monitoring of large-scale CO2 injection. The methodology is a two-step approach within the Bayesian framework where lower resolution data are inverted first, and subsequently used in the generation of the prior model for inversion of the higher resolution data. For the application of CO2 monitoring, the first step is done with either controlled source electromagnetic (CSEM) or gravimetric data, while the second step is done with seismic amplitude-versus-offset (AVO) data. The Bayesian inverse problems are solved by sampling the posterior probability distributions using either the ensemble Kalman filter or ensemble smoother with multiple data assimilation. A model-based parameterization is used to represent the unknown geophysical parameters: electric conductivity, density, and seismic velocity. The parameterization is well suited for identification of CO2 plume location and variation of geophysical parameters within the regions corresponding to inside and outside of the plume. The inversion methodology is applied to a synthetic monitoring test case where geophysical data are made from fluid-flow simulation of large-scale CO2 sequestration in the Skade formation. The numerical experiments show that seismic AVO inversion results are improved with the sequential inversion methodology using prior information from either CSEM or gravimetric inversion.


Introduction
Storing CO 2 in large, saline aquifers is considered one of the remedies for greenhouse-gas emission. Cost-efficient CO 2 sequestration in large aquifers with an aim to store a large amount of CO 2 over a restricted period of time will likely involve high injection rate spread over few injection wells. The combination of high injection rate and few injection wells can lead to hazardous pressure build-ups. If pressure develops over certain thresholds, situations like, e.g., near-well fracturing and fault reactivation can occur, with possibly severe consequences. To be able to detect areas with potential hazardous over-pressure, especially far from the wells, periodical geophysical monitoring surveys Svenn Tveit svtv@norceresearch.no 1 NORCE Norwegian Research Centre, 5020, Bergen, Norway 2 Norwegian Geotechnical Institute (NGI), 0855, Oslo, Norway 3 Octio AS, Kanalveien 119, 5068, Bergen, Norway have to be conducted. Geophysical monitoring is also important for verifying CO 2 plume placement and fluidflow simulations, and detecting leakage to the surface.
The most widely used geophysical monitoring method is the seismic method. A common technique for seismic inversion is amplitude versus offset (AVO) [15], where elastic parameters are estimated from seismic reflection coefficients. Seismic time-lapse signals are sensitive to changes in subsurface elastic properties, where changes due to contrasts in both saturation and pressure are important for CO 2 monitoring. Discrimination between saturation and pressure effects is discussed, e.g., in [16,33,47,58,68,73,74]. Obtaining reliable saturation and pressure estimates from AVO data can be difficult, due to data and modelling errors, poor conditioning of the linearized AVO system, and significant uncertainties in the petroelastic model. To increase the reliability of inversion results, combining seismic data with information from complementary geophysical data types is an option. The complementary data types considered in this paper are controlled source electromagnetic (CSEM) and gravimetric data.
The CSEM method has been used extensively in exploration, to detect hydrocarbon reserves, see, e.g., [20,24,44,62,69]. Electromagnetic (EM) signals are sensitive to the subsurface electric conductivity, which depends largely on the fluid content in the pores. The conductivity contrast between CO 2 -saturated and brine-saturated rock can be significant, and hence, there is potential for monitoring saturation effects with CSEM, see [8,54,63,64]. To penetrate sufficiently deep into the subsurface, lowfrequency (0.1-10 Hz) signals are, however, applied with CSEM. For this reason, it is anticipated that the CSEM data will have coarser spatial resolution than seismic AVO data.
Gravimetric methods have been used in many monitoring case studies, e.g., reservoir production monitoring, see, e.g., [37,79,81]. The measured gravitational field in monitoring studies is sensitive to changes in density. The CO 2 density is (in most cases) less than the brine density, and the density change resulting from displacing brine by CO 2 is significant enough to produce detectable gravity signals. In addition to density changes due to different fluid content in the pores, the fluid densities are dependent on pressure (and temperature). Hence, it is possible to monitor pressure and saturation effects with gravity data, although pressure effects on density are often weak. The spatial resolution of gravity data is lower than that of seismic data. The cost of gravity measurements is, however, lower than those of seismic and CSEM measurements. Several studies have concluded that gravity data provide valuable information for CO 2 monitoring, both as stand-alone measurements and as a supplement to other geophysical methods [5,30,41,48].
Combining complementary geophysical data types is not a straightforward process. Many of the joint-inversion techniques that have been suggested in the literature are based on the assumption that the data types are linked through petrophysical or structural relationships. Examples of joint-inversion techniques using petrophysical relationships can be found, e.g., in [2,13,39,61]. In most cases, the petrophysical relationships are empirical and require calibration to experimental data. Alternatively, jointinversion techniques based on structural relationships can be used, where the main assumption is that the data types are functions of the same underlying structure. Examples of structural joint-inversion techniques can be found in [17,29,35,52]. The cross-gradient approach, introduced in [28], is perhaps the most used approach.
While the joint-inversion techniques described above aim to utilize complementary data types in a single inversion process, so-called cooperative inversion techniques [55] aim to invert the data types in separate steps, with the resulting model from inversion of one data type acting as starting model or constraint for the subsequent inversion of another data type. Examples of cooperative inversion techniques can be found in [75,76], where interpreted seismic inversion results are used as structural prior information for CSEM inversion, and in [17,40,70], where inversion of each data type is done in sequence and, in some cases, iterated. Exchanging information between geophysical models in the disparate inversion sequences can be challenging, especially if the spatial resolutions of the data types are different. In [78], seismic velocity and electric resistivity models were coupled through exchange of structural information, but that required an extra inversion step between the (iterated) inversion sequences.
Our inversion strategy belongs within the Bayesian framework, which combines prior knowledge and information from observed data to yield the solution as a posterior probability density function (PDF). With the posterior PDF, a best-estimate geophysical model with associated uncertainty will be provided. In general, a complete characterization of the posterior PDF is only possible by using sampling techniques. Markov chain Monte Carlo (MCMC) methods (for application of MCMC methods to geophysical problems, see, e.g., [9,11,34,66]) can sample correctly from the posterior PDF.
To use different geophysical data types jointly, we follow ideas from cooperative inversion, and further develop an inversion strategy introduced in [77]. We suggest a sequential approach where data with lower spatial resolution are inverted first, and subsequently, the results are applied in the construction of the prior model for the inversion of data with higher resolution. As discussed above, both CSEM and gravimetric data have lower spatial resolution than seismic AVO data. Thus, either CSEM or gravimetric inversion will be performed in the first step, before the seismic AVO inversion in the second step. The construction of the prior model for the seismic inversion is facilitated through the use of the same type of parameterization for the CSEM, gravity, and seismic inversions.
We will apply a model-based parametrization to represent the unknown functions in the inversions. The particular model-based parameterization applied here (see, e.g., [7,76]) is based on the level-set framework, facilitating representation of region boundaries without a priori restrictions on their shapes. It is therefore well suited to represent the boundaries of the images of a large-scale CO 2 plume in the respective geophysical domains, that is, in the electric conductivity, density, and seismic velocity. It is expected that these properties will be slowly varying both within the region corresponding to the plume and outside that region, while the variation can be abrupt when crossing the region boundary. The applied parameterization is able to handle this type of variation using a relatively small number of parameters.
MCMC methods require a huge number of forwardmodel runs for a sufficient description of the posterior PDF, leading to extremely high computational costs for realistic problems. To reduce computational costs, two ensemble-based Bayesian methods, the ensemble Kalman filter (EnKF) [22] and the ensemble smoother with multiple data assimilation (ES-MDA) [21], which require only a moderate number of forward model runs, will be applied. These methods have an underlying Gaussian assumption on the involved PDFs, and can therefore be shown to sample correctly from the posterior PDF only in the case where the prior model is Gaussian and the forward model is linear, in the limit of an infinite ensemble size. They have, however, shown to sample approximately correct in many scientific fields where the forward models are nonlinear, see, e.g., [23] and [1], and references therein. Ensemble-based Baysian methods have been used for inversion of CSEM data [75] and inversion of seismic data [31,32,57,72].
The inversion methodology will be applied to a synthetic CO 2 monitoring test case where the geophysical reference ('true') models are made based on fluid-flow simulations of large-scale CO 2 injection in the Skade formation [19]. The Skade formation is considered a potential candidate for storing large amount of CO 2 in the North Sea [36]. Thus, the test case serves as a feasibility study to asses the effectiveness of long-term monitoring of CO 2 sequestration in the formation. The numerical setup and results from this study will be presented in Section 4. The rest of the paper is organized as follows: in Section 2, the forward models for CSEM, gravimetry, and seismic methods are presented. Section 3 describes the inverse problem and consists of three main parts: the reduced, model-based parameterization is described in Section 3.1 followed by the ensemble-based, Bayesian methods in Section 3.2, and lastly sequential utilization of CSEM, gravimetric, and seismic data is discussed in Section 3.3. We end the paper with some concluding remarks in Section 5.

Forward Models
The rock physics model, converting reservoir saturation and pressure to geophysical variables, is described in Section 4.1. The three geophysical methods used in this paper-CSEM, seismic, and gravimetry-are simulated using three separate forward solvers. Common for all three forward models is that the computational domain is 2D, denoted ∈ R 2 . In the following, let x = (x, z) T denote an arbitrary position vector, and let N g be the number of grid cells when is discretized.

CSEM
The governing equation for CSEM signals is the frequencydomain wave equation given by where e denotes the electric field and j denotes the electric source current distribution. Furthermore, ω denotes the angular frequency, μ denotes the magnetic permeability, and i = √ −1. It is assumed that no free electric charges are present, and displacement currents are neglected due to the application of low-frequency signals. The electric conductivity, σ , is assumed to be vertically transverse isotropic, where the vertical conductivity is fixed to twice the magnitude of the horizontal conductivity. Since the factor between vertical and horizontal conductivity is fixed, we only focus on horizontal conductivity and denote it by σ . The wave equation, (1), is solved using the 2.5D finite element simulator MARE2DEM (see [45,46] for extensive description of the simulator), where is discretized into N g triangular elements using Triangle [67].

Gravimetry
For gravimetry, the gravity field, g, must satisfy the following equations where G denotes the universal gravitational constant and ρ denotes the density. In gravimetric inversion, the responses are calculated as the difference in the vertical gravitational field, g z , between two vintages. The general solution of (2) for g z is given as To solve (3), the analytical approach given in [71] is used, where is discretized into N g triangular elements with Triangle. It is assumed that ρ is constant for each element.

Seismic AVO
The general forward model for sesimic AVO is given by the Zoeppritz equations, which describe the reflection and transmission coefficients of primary (P) and secondary (S) waves at a rock boundary. The Zoeppritz equations are tedious to evaluate numerically, and often a linear approximation is used instead. In this paper, we use the linear approximation in [4], where the reflection coefficient between incident and reflected P waves (R pp ) is given by V p and V s denote the P-and S-wave velocities, θ denotes the incident (or reflection) angle, and the overbar denotes average velocity over the reflecting surface. Note that the linear approximation (4) describes the reflection coefficient from one reflecting surface and one θ , and is only valid for weak velocity contrasts and θ significantly below the critical angle. To expand (4) for use with multiple θ and reflecting surfaces, we followed the description in [12,Appendix B]. Note that, it is assumed that effects such as geometrical spreading, multiples, and absorption have been removed or corrected for in a (pre-)processing step, and it is also assumed that deconvolution and time-depth conversion have been performed. Furthermore, it is assumed that data recorded as a function of source-receiver offsets have been transformed to be function of incident angles, θ.
(The transformation can be done with, e.g., ray-tracing or approximate offset-angle formulas.)

Inverse problem
We consider the sequential inversion strategy where electromagnetic or gravimetric data is inverted first, and utilize results from this inversion to construct a prior model for the inversion of seismic data. Several inverse sub problems, involving different physical quantities, will therefore be considered in this paper: inversion of seismic data to seismic velocity; electromagnetic data to electric conductivity; and gravimetric data to density. Major components of the inversion methodologies applied to solve the different sub problems will, however, be identical.
To keep the description of these common features of the methodology concise, we introduce a common notation. Let d ∈ R N d denote measured data, f (x) ∈ F the unknown function to be estimated, and g (f ) ∈ R N d the forward model output. With frequency-domain CSEM, data will be complex. In that case, d and g will be augmented vectors containing real and imaginary parts of the complex data and forward model output, respectively. To solve the associated inverse problem, that is, to estimate f (x) from d and additional available information, we use an ensemblebased Bayesian method in conjunction with a model-based parameterization, q (x; m) ∈ Q, of f (x), where m ∈ R N m denotes the unknown parameter vector and Q ⊂ F. A description of how results from the different inverse sub problems are combined, and a reasoning behind the way the data types are utilized together in the sequential strategy, is given in Section 3.3. Results obtained with this strategy will be compared with those obtained by direct inversion of seismic data in Section 4. Note also that V s and ρ are fixed at their true values in all seismic inversions.

Reduced, model-based parameterization
The parameterization applied in this paper is not standard. We will therefore relate it to more standard parametrizations. The vast majority of papers concerned with solving inverse problems parameterize the unknown function by a constant value in each forward model grid cell (pixel parameterization). We start by writing the pixel parameterization in a mathematical notation suitable for this paper. Let the inversion region, , be the union of the members in a set of predetermined non-overlapping subdomains, j N m j =1 , and let χ j (x) denote the indicator function for j (i.e. χ j = 1 in j and zero elsewhere). One may then write q (x; m) as a linear basis expansion that is, a standard zonation with zones j N m j =1 . Letting N m = N g and letting j correspond to grid cell number j results in (5) being a pixel parameterization of f .
With standard zonation, the zones are fixed when solving the inverse problem while the values in each zone, the m j s, are estimated. In order to change zone boundaries, one may introduce dependencies on a set of control parameters in χ j N m j =1 . To this end, let the parameter vector consist of two sub vectors; m T = (c T , a T ), where c ∈ R N c , a ∈ R N a , and N c + N a = N m , and write q (x; m) as a non-standard zonation with N c zones, The dependencies of the χ j s on a may be utilized to change the boundaries of the corresponding zones, while c now plays the role that m has in a standard zonation. We will do shape estimation, and therefore parameterize f by what can be seen as an approximation to a particular type of non-standard zonation-the reduced, smoothed level-set representation. Details on the representation can be found in [75,76], and references therein. For the convenience of the reader, we have summarized the representation in Appendix 1. Note that the representation can easily be extended to 3D, following [7].

Ensemble-based Bayesian inversion
The relation between the random variables d and m is where d ∈ R N d denotes the a realization of the measurement error vector, and g (q (x; m)) has been written g (m) for convenience. Before any data have been applied, m follows the prior probability density function (PDF) p(m), and for a given m, d follows the conditional PDF p(d|m) = p( d = d − g(m)), which we assume is a zero-mean Gaussian distribution with covariance matrix C d . Bayes rule for PDFs then implies that the conditional PDF of m given d obeys The posterior PDF, p(m|d), describes the complete solution to the inverse problem. An analytical expression for p(m|d) is only feasible when p(m) is Gaussian and g (m) is linear. Otherwise, p(m|d) must be characterized through sampling. Markov chain Monte Carlo methods sample correctly from p(m|d), but are prohibitively computationally expensive for realistic geophysical problems. In Sections 3.2.3 and 3.2.4, we describe the computationally feasible, approximate sampling methods for parameter estimation that are applied in this paper; the ensemble smoother with multiple data assimilation (ES-MDA) [21] and the ensemble Kalman filter (EnKF) [22]. For convenience of the reader, we will, however, first briefly describe the Kalman filter [43] and the ensemble smoother [50] in a parameter-estimation setting.

Kalman filter
Let m 0 and m 1 denote the prior and posterior model, respectively. If p(m 0 ) is Gaussian and g (m) is linear, that is, g (m) = Am, p(m 1 |d) will be Gaussian. Its mean is expressed by the Kalman filter equations [42], where K denotes the Kalman gain, andȳ and C y denote the mean and auto covariance of y, for any y. Furthermore, C yz denotes the cross covariance between y and z, for any y and z.

Ensemble smoother
If p(m 0 ) is not Gaussian and/or g (m) is nonlinear, p(m 1 |d) must be characterized by sampling. With the ensemble smoother, (9)-(11) are applied to each member in the sample (ensemble member) and C m 0 w and C w in (10) are replaced by the empirical covariances, C m 0 w and C w , calculated from the corresponding ensembles.
To be able to write the ensemble-smoother equations in a concise manner, let M and D denote the matrices holding the model ensemble members and data ensemble members as columns, respectively; M = m 1 , m 2 , . . . , m N e , D = d 1 , d 2 , . . . , d N e . Hence, M 0 contains a sample from p(m 0 ) and M 1 contains a sample from p(m 1 |d).
To generate M 0 , we use the Cholesky decomposition method described in Appendix 2. The matrix D contains a sample from N (d, C d ), where N denotes the Gaussian distribution. Defining the matrices G(M) = g(m 1 ), g(m 2 ), . . . , g(m N e ) and W = w 1 , w 2 , . . . , w N e , the ensemble-smoother equations may be written as From M 1 , one may calculate approximations to the two first moments of p(m 1 |d),m 1 and C m 1 , empirically. The ensemble smoother thus provides a best estimate and a quantification of its uncertainty. The sample mean, m 1 , can be obtained by inserting m 1 for y in Appendix 3, while the sample covariance, C m 1 , can be obtained by inserting m 1 for y and z.

Ensemble smoother with multiple data assimilation
When g (m) is nonlinear, iterations are generally required to obtain an accurate estimate for m, while the ensemble smoother assimilates d in a single step. In an attempt to alleviate this problem with the ensemble smoother, the ES-MDA allows for d to be assimilated in N u smaller steps in a statistically correct manner. To this end, a sequence of real positive scalars, η 1:N u , is introduced, and it is required [21]. The data covariance in cycle number u is inflated by η u , that is, D u contains a sample from N (d, η u C d ), such that the estimate after completion of u cycles will depend on η 1:u .
To describe the ES-MDA in the ensemble-matrix notation introduced in Section 3.2.2, let M u denote M after assimilation cycle number u has been completed, that is, M u contains a sample from p(m u |d, η 1:u ). The ES-MDA equations for cycle number u may then be written as After cycle number N u , one obtains the final updated model ensemble M N u , from which one may calculate empirical approximations to the two first moments of the posterior PDF, p(m N u |d, η 1:N u ), in a similar mannner as described in the final paragraph of Section 3.2.2. Typical values for N u are 4 -8. Theoretical and practical procedures for choosing η u can be found in [65].

Ensemble Kalman filter
While the ensemble smoother assimilates all data simultaneously in a single step and the ES-MDA assimilates all data simultaneously in a sequence of smaller steps, the EnKF is a sequential estimation methodology that assimilates part of the data in each assimilation cycle until all available data have been assimilated. It has been shown [25,26] that sequential estimation can be expected to outperform simultaneous estimation in a single step for weakly nonlinear problems.
To describe the EnKF in the ensemble-matrix notation, we split D and G into submatrices, where N v denotes the number of data groups, D v denotes the ensemble matrix for data group number v, and G v denotes the ensemble matrix for the corresponding forward model. Furthermore, let M v denote M after assimilation of v data groups have been completed, that is, M v contains a sample from p(m v |d 1:v ). The EnKF equations for cycle number v may then be written as After cycle number N v , one obtains the final updated model ensemble M N v , from which one may calculate empirical approximations to the two first moments of the posterior PDF, p(m N v |d 1:N v ), in a similar manner as described in the final paragraph of Section 3.2.2.
Note that the computational expense is approximately equal to N e times the computational expense of one forward model run for EnKF, and (N u · N e ) times the computational expense of one forward model run for ES-MDA. Hence, ensemble-based methods are suitable for problems with large N m and N d .

Sequential utilization of different data types
Seismic P-wave velocity depends on saturation and pressure. We will not invert for saturation or pressure directly, but rather invert for the geophysical parameters, σ , ρ, and V p . In particular, we will use inversion results for σ and ρ to improve inversion results for V p . If desired, saturation and pressure effects can be inferred from these inversion results.
In the very early phase of a CO 2 injection, a pressure front is advancing, followed by a saturation front which is advancing much more slowly. Hence, in a later phase, no pressure front is found in the vicinity of the advancing saturation front. So, except in the very early stages, the pressure variation during CO 2 injection in a reservoir has a different character than the saturation variation, which defines the CO 2 plume.
These characteristics are reflected in the true V p , depending on the rock physics. It may, however, be difficult to identify them in the V p obtained by inverting seismic data, due to data and modelling errors and instability of the inversion. In particular, using a pixel parameterization to represent V p may result in pixel-scale errors that blur the underlying large-scale CO 2 plume. The parameterization we apply here, however, directly represents large-scale structures, like a CO 2 plume. This means that while the inversion may not ensure a correct placement and shape of the plume, it will by construction of the parameterization avoid blurring of the plume by pixel-scale errors.
Electric conductivity depends on saturation, but not on pressure. Density depends on saturation and pressure. The variation in ρ across the CO 2 -plume boundary is, however, significantly stronger than the variation due to pressure differences at neighbouring locations. Abrupt changes in σ or ρ with x therefore indicate the location of the CO 2 -plume boundary (at least when using the parameterization described in Section 3.1, since pixel-scale errors then are avoided). The resolution with which σ and ρ can be determined from CSEM and gravimetric data, respectively, is, however, coarser than that with which V p can be determined from seismic data. It is therefore not straightforward to utilize information about the CO 2 plume obtained from CSEM or gravimetric inversion in the seismic inversion. It would, for example, not be advisable to fix the CO 2 -plume boundaries to those obtained from CSEM or gravimetric inversion when inverting the seismic data.
In [48], a sequential approach for CO 2 estimation in the Sleipner field was proposed, where seismically derived saturation changes were used as input to gravity modelling. Part of the background for their approach was that timelapse pore pressure changes were moderate at Sleipner, so that saturation effects dominate. A main aim for the Skade modelling study, motivating our work, was to investigate large-scale CO 2 injection with a small/realistic number of injection wells, such that large pressure effects must be anticipated. Our 'end product' is the seismic-velocity estimate, and we are just as interested in the pressure effect reflected in the seismic velocity as in the saturation effect. We therefore suggest an alternative sequential, twostep inversion strategy for joint utilization of CSEM or gravimetric data with seismic data. The main idea with the sequential procedure is to first gain knowledge about the location and shape of the CO 2 plume using CSEM or gravity data, which are both mainly influenced by saturation changes, and have lower resolution than seismic data. Subsequently, this knowledge is utilized to obtain an improved prior model for the seismic inversion, where we aim to obtain good estimates of both saturation-induced and pressure-induced changes in V p . Implementation of this knowledge into the prior model for V p is facilitated by using the same parameterization to represent σ , ρ, and V p .
We summarize the two-step, sequential inversion strategy as follows: Step 1: Invert CSEM or gravimetric data to get an approximate location and shape of the CO 2 plume.
Step 2: Invert seismic data with prior information on the location and shape of the plume from step 1.
Note that we do not gain knowledge about variation of V p with pressure from CSEM or gravity inversion. Hence, when making the prior model for seismic inversion, information on pressure effects on V p must be apprehended from other sources, e.g., converting reservoir simulation results to seismic velocity via rock physics relations.
The sequential inversion strategy shares common themes with many current seismic full-waveform inversion schemes. There, low-frequency inversion results are used to get information on the general structures, which in turn are used to build initial models for subsequent high-frequency inversions. Hence, our two-step, sequential inversion strategy could be adapted in the case of seismic full-waveform inversion with step 1 using low-frequency seismic inversion, possibly together with CSEM or gravimetric inversion, and high-frequency seismic inversion in step 2.

Numerical experiments
The inversion methodology described in Section 3 was applied to synthetic data generated from simulated CO 2 injection in the Skade formation. The sequential inversion strategy described in Section 3.3 was employed in two separate test cases: one where step 1 was performed with CSEM inversion, and the other where step 1 was performed with gravity inversion. We compared the inversion results from of the two acquisition methods in step 1. Subsequently, we wanted to assess how the different prior models from step 1 influenced on the final results of step 2. Finally, the performance of the sequential inversion strategy was compared with seismic inversion without any prior information from CSEM or gravity inversion results.
The EnKF was used to perform CSEM and seismic inversions, while the ES-MDA was used in the gravity inversion. The reason for choosing the ES-MDA for gravity inversion is that no reasonable way of grouping the data was found; see the brief discussion on data grouping in Section 4.2. The ES-MDA is computationally more expensive than the EnKF: the computational cost of one assimilation cycle in the ES-MDA equals the total computational cost of the EnKF. However, the gravity forward model has a low computational cost, which made the use of the ES-MDA feasible.
When we performed CSEM and seismic inversion, the inversion parameters were σ and V p , respectively. For gravity inversion, however, the inversion parameter was ρ (difference between density at the time data was collected and density prior to CO 2 injection) rather than ρ. The reason for using ρ as inversion parameter is that g z is difficult to measure and the truly meaningful data content is the change between two vintages, g z [38].
Since a total of three seismic inversions will be conducted, a shorthand label for each one is given as follows: AVO c is short for step 2 with prior information from CSEM inversion results; AVO g is short for step 2 with prior information from gravity inversion results; and AVO w denotes seismic inversion without prior information from either CSEM or gravity inversion results.

Skade formation and synthetic data generation
Together with the Ve Member, the Utsira Formation, and Upper Pliocene sands of the Nordland Group, the Skade Formation forms the outer part of a large deltaic system with its source area on the East Shetland Platform. The Skade Formation, Lower Miocene, consists of marine sandstones deposited over a large area of the Viking Graben. The maximum thickness is more than 500 m and the thickness decreases rapidly towards the east, where the sands terminate towards large Oligocene shale diapirs. Based on available pore volume, the estimated storage capacity of CO 2 in the Skade formation is approximately 15 Gton [10].
To simulate large-scale CO 2 injection in the Skade formation, the commercial reservoir simulator Eclipse™ was used. The 3D reservoir model was set up following [19]. The formation has not been well characterized geologically; thus, the porosity and permeability are assumed to be homogeneous with values taken within the range of Utsira sand data. Specifically, porosity was set to 0.16 while horizontal and vertical permeability were set to 1476 mD and 147.6 mD, respectively. Three injection wells were set up in the south part of Skade (see Fig. 1), and CO 2 was injected over a 50-year period (year 2020-2070) with injection rate set as high as possible without exceeding the fracture pressure anywhere in the formation. (The fracture pressure was estimated based on rock-mechanical relations expected to be valid for the formation.) In total, approximately 3 Gton CO 2 was injected over the 50-year period. The geophysical background model (i.e. before CO 2 injection) from the seabed to top Mjur formation was built using depth-converted seismic horizons and upscaled properties from a well log (15/9-3, located at the south end of the formation). In the CO 2 injection period, standard petrophysical relations described below were used to convert saturation and pressure to V p , σ , and ρ. In the following, let subscripts 1 and 2 denote properties before and after CO 2 injection, respectively, and let the change in a generic property, τ , be denoted by τ = τ 2 − τ 1 . Furthermore, let S CO 2 denote saturation of CO 2 and P denote pressure. To generate the conductivity model, Archie's second law, assuming constant porosity, was used, To generate P-wave velocity and density models, the following relationships from [47] were used, In (23) and (24), b, k, l, and m are empirical parameters, which in this paper are given as calculated from Utsira data. We note that the model in [47] assumes that the coefficients in (25), which typically are calibrated against a few rock samples, are valid everywhere and independent of porosity. In [49], the author proposed an improved model, where they account for heterogeneous porosity, initial saturation, and pressure.
To set up the monitoring test case, the true geophysical models were generated using S CO 2 and P from year 2070 (i.e. at the end of the CO 2 injection); confer Fig. 2. We focused the test case area around injection well W2. To generate synthetic CSEM and gravity data, the commercial software COMSOL™ was used, while the reflection coefficient approximation described in Section 2.3 was used to generate the synthetic AVO data. Note that V s in (4) was generated from V p using a V p V s ratio of √ 14 2 ≈ 1.8708, which lies within the range for sandstone formations [60].

Set up of experiments
In the numerical experiments, it was assumed that the geology was sufficiently well known such that only includes the Skade formation, while geophysical parameters in \ (i.e. the computational domain outside the inversion domain) are fixed to the background model. For the CSEM inversion, σ in \ was given as seen in Fig. 3a. For gravity inversion, ρ in \ was zero; see Fig. 3b. Gaussian random noise will be added to the data. For CSEM and gravity data, the noise standard deviation will be set relative to the magnitude of the data, see, e.g., [51,66]. For seismic (reflection coefficient) data, however, we set the noise standard deviation to a fixed value. Alternative noise models are viable for all data types, and one may anticipate that the choice of noise models can influence on the estimation results. Since fixing the inversion region is a simplification of the inversion problem, we have tried to compensate by applying data error models that can be seen as conservative, see, e.g., [3]. A thorough investigation of the influence of data error models on inversion results is, however, beyond the scope of this paper.
For the seismic inversions, the data are R pp within , hence no background model is needed. Note that in the following, CSEM and gravity data are contaminated with random noise relative to the magnitude of the data, similar to what is common in geophysical literature, see, e.g., [51,66]. For seismic data, however, it is well known that amplitudes can be difficult to measure because of noise and problems with amplitude-and frequency-preserving processing. This, together with the fact that R pp can often be zero, or very close to zero, leads us to choose a random noise with fixed variance for the seismic data.
The data used for the CSEM inversion, e x , were extracted at 26 sea-floor receivers, evenly distributed with 500 m intervals for x ∈ (18000, 30500) m and z = 150 m. The source frequency was 0.25 Hz, and eight source positions, evenly distributed with 2000 m interval for x ∈ (17300, 31300) m and z = 120 m, were applied; see Fig. 3a. Five percent Gaussian noise with noise floor 10 −15 V/Am 2 was added to the data. Furthermore, data from receivers less than 2000 m away from the source position were removed to avoid influence from the direct wave.
The data used for the gravity inversion, g z , were extracted at 45 sea-floor receivers, evenly distributed with 500 m interval for x ∈ (12500, 34500) m and z = 150 m; see Fig. 3b. Ten percent Gaussian noise was added to the data.
The data used for the seismic inversions, R pp , were extracted at θ = (5,10,15,20,25,30) • . Recall from Section 2.3 that we have assumed that the θs have been converted from source-receiver offsets in a processing step. Gaussian noise with standard deviation 0.007 was added to the data, which typically lie within the range R pp ∈ (0, 0.04).
For the CSEM inversion, the data were divided into a subset of eight groups (N v = 8), where each group consisted of data obtained with one particular source position. The first group corresponded to source position (x, z) = (17300, 120) m, and the subsequent data groups followed the adjacent source positions as defined above. For the seismic inversions, the data were divided into six groups (N v = 6), where each group consisted of data from one element in θ. The first group corresponded to θ 1 = 5 • , and subsequent data groups followed the increasing angles up to 30 • , as described above. Note that the ordering of data may influence the inversion results [27], but obtaining the 'best' practice for grouping the data is beyond the scope of this paper.
In the gravity inversion with the ES-MDA, all data are used simultaneously. (Gravity data can only be grouped by receiver position, thus only part of would have been covered by each sequential step, had the EnKF been used.) The number of assimilation cycles were chosen as N u = 8 Fig. 3 a σ and b ρ in at year 2070. Source positions (for CSEM) are indicated by • and receivers (for both CSEM and gravimetry) are indicated by . Note that is outlined with a solid black line and the inflation factor was chosen as η u = 1/N u for u ∈ [1, N u ]. Optimal tuning of N u and η 1:N u is beyond the scope of this paper.
The representation given in Section 3.1 was applied to model two regions: inside and outside the CO 2 plume. The shape of the modelled CO 2 -plume boundary is given by  ζ (x, a), and N a = 45 parameter grid nodes, evenly distributed over the Skade formation (nine parameter grid nodes in the x-direction from x = 12000 m to x = 35000 m, and five parameter grid nodes in z-direction from z = 890 m to z = 1130 m), were applied in (30) to represent φ (x, a).
Since it it assumed (see, Section 4.1) that σ is independent of pressure, and since the simulated CO 2 saturation does not vary much inside the plume, (28) was used to represent σ in the CSEM inversion. Since the variation of ρ with pressure is weak, (28) was also used to represent ρ in the gravity inversion.
Since the variation of V p with pressure is more pronounced, V p was represented with (29) in the seismic inversion. We let k 1 (x; c 1 ) and k 2 (x; c 2 ) be given by (30) with N c 1 = N c 2 = 15, and let the parameter grid nodes for k 1 (x; c 1 ) and k 2 (x; c 2 ) be evenly distributed over the same area as for the representation of φ (x, a), but now with five nodes in x-direction and three in z-direction.
Initial ensembles for the CSEM, gravity, and AVO w inversions were generated according to the description in Appendix 2, with N e = 100 for the CSEM and gravity inversions, and N e = 1000 for the AVO w inversion. (Initial ensembles for AVO c and AVO g are described in Sections 4.3.4 and 4.3.5, respectively.) The values for N e used in the experiments were chosen such that N e > N d , to avoid problems with strong unwarranted reduction of the variability among ensemble members. (See, for example Chap. 14 in [23] for a discussion of this issue, also known as ensemble collapse). The mean prior model,m 0 , was selected to reflect the situation just after the CO 2 injection started. Figure 4a shows the resulting ζ (i.e. zero level set; see Appendix 1) while the resulting initial values for σ , ρ, and V p are Table 1 Input parameters for generation of C a 0 . Note that the same C a 0 was used in the CSEM, gravity, and AVO w inversions. Confer Appendix 2 for description of input parameters Fig. 4b-d. To generate C m 0 for the CSEM, gravity, and AVO w inversions, it was assumed that the C c 0 1 = C c 0 2 = C c 0 and the input parameters in Tables 1 and 2 were used. Recall that for CSEM and gravity inversions, C c 0 reduces to a variance, β, since (28) was used.

Inversion results
In this section, inversion results using the sequential, twostep inversion methodology and AVO w are presented. To make it easier to compare inversion results where different forward model simulators (with different discretizations) have been used, all inversion results are mapped onto a separate plotting grid. The plotting grid was made using equidistant grid cells in xand z-direction, covering . Furthermore, the figures have been vertically exaggerated.

Step 1: CSEM inversion
The true σ for the CSEM inversion is shown in Fig. 5a. Figure 5b and c show mean of the initial and final updated σ . From Fig. 5c, it is seen that shape of the CO 2 plume, given by the low-conductivity structure, has good correspondence with the true shape of the plume, with some deviations at the top of the formation. The conductivity of the CO 2 is well estimated, while the brine conductivity is underestimated.
In Fig. 6a and b, it is seen that the variance of σ has been reduced significantly. Some high variance, relative to other areas, can be seen on the left side of the formation, Fig. 4 a ζ generated usingā 0 . b σ , c ρ, and d V p models made withm 0 for the CSEM, gravity, as AVO w inversions, respectively Table 2 Input parameters for generation of C c 0 . Note that the unit for β is S/m for CSEM, kg/m 3 for gravity, and m/s for seismic inversion. Confer Appendix 2 for description of input parameters indicating higher model uncertainty in this area. The spread of the initial ensemble members, shown in Fig. 6c, has been much reduced in the final ensemble, see Fig. 6d, again indicating a reduction in model uncertainty.

Step 1: Gravity inversion
In Fig. 7b and c, the mean of the initial and final updated ρ is shown. Comparing Fig. 7c with the true ρ in Fig. 7a, it is seen that the shape of the CO 2 plume is not well approximated on the right side, while on the left side it is closer to the true shape. It is also seen that the ρ values outside the CO 2 plume are well approximated in most areas, except a small area in the bottom left corner of the formation.
From Fig. 8a and b, it is seen that there are areas where the variance has not been reduced much from initial to final ensemble, especially around the CO 2 plume front. Figure 8c and d show that the spread of the ensemble members from initial to final has been reduced to some extent. In total, Fig. 8a and d show that the model uncertainty has only been partially reduced in the ensemble-based inversion, and that the areas where the estimation deviates most from the true ρ have the highest uncertainty.

AVO w
Before assessing the inversion results from step 2 with AVO c and AVO g , the results from AVO w is presented.
The means of the initial and final updated V p s are shown in Fig. 9 b and c. Comparing the mean of the final updated V p with the true V p in Fig. 9a, it is seen that the left side of the formation is well approximated, while the CO 2 plume (given by the low-velocity shape) is not well approximated on the right side of the formation. From Fig. 10a and b, it is seen that the variance has been reduced much except in a few areas at the top of the formation. Looking at Fig. 10c and d, it is seen that the spread of the ensemble members has been reduced much, especially for the left CO 2 front, while significant model uncertainty can be seen on the right side and top left of the formation. The areas with highest uncertainty are where the deviation of the final updated V p is largest compared with the true V p .

Step 2: AVO c
Following the sequential, two-step inversion strategy, knowledge about the location of the CO 2 plume from the CSEM inversion was used to make the prior model for AVO c . Specifically, the mean of the final updated a from the CSEM inversion was used asā 0 in AVO c . Figure 11a shows ζ generated withā 0 . Since V p depends on both saturation and pressure, and we do not gain information on pressure from CSEM inversion, we letc 0 be the same as for AVO w , see Fig. 11b. (If we have information on pressure from, e.g., well measurements, a betterc 0 can be made.) The initial ensemble was generated with 1000 realizations, where C a 0 and C c 0 were the same as given for AVO w in Tables 1 and 2 except β = 10 for C a 0 (to reflect that the step 1 inversion has reduced the prior uncertainty in step 2 for the shape of the CO 2 plume).
In Fig. 12b and c, the means of the initial and final ensembles are shown. Comparing Fig. 12c with the true V p in Fig. 12a, it is seen that mean of the final updated V p approximates the true V p well, both in terms of the shape of the CO 2 plume and V p distribution inside and outside the CO 2 plume. Step 1 CSEM inversion. Variance of the a initial and b final updated σ ; ζ generated using ensemble mean (solid black) and members (grey) from the c initial and d final ensemble Fig. 7 Step 1 gravity inversion. a True ρ. Mean of the b initial and c final updated ρ Fig. 8 Step 1 gravity inversion: variance of the a initial and b final updated ρ; ζ generated using ensemble mean (solid black) and members (grey) from the c initial and d final ensemble    Fig. 13a and b, it is seen that the variance has been reduced much from initial to final ensemble. A similar conclusion can be made by looking at the spread of the initial and final ensemble members in Fig. 13c and d, where it is seen that the uncertainty on the shape of the CO 2 plume is small.

Step 2: AVO g
To generate the initial ensemble for AVO g , the same procedure as for AVO c , discussed in Section 4.3.4, was used:ā 0 was given as the mean of a from the final ensemble in step 1, whilec 0 was the same as for AVO w (following the same arguments as in AVO c ); see Fig. 11c and d. Furthermore, C c 0 and C a 0 were the same as in AVO c , and 1000 realizations were generated for the initial ensemble.
The means of the initial and final ensembles are shown in Fig. 14b and c, and it is seen in Fig.14c that the shape of the CO 2 plume and the V p distribution approximate the true V p in Fig. 14a well.
Looking at Fig. 15a and b, it is seen that the variance has been reduced much from initial to final ensemble, with some higher variance around the right CO 2 plume front. From Fig. 15c and d, it is seen that the spread of the ensemble members has been reduced much, especially around the left front of the CO 2 plume.

Data misfit
To make a quantitative comparison of AVO w , AVO c , and AVO g , we calculate the data misfit using ensembles from all three inversions. The data misfit for a single ensemble member, m j , is calculated as, for AVO c and AVO g were similar to that for AVO w , and are therefore not shown.) We see that the data misfit from all three inversions has been reduced much from initial to final updated ensembles, and end up close to N d (often used as solution criteria in inversion algorithms within the classical inversion framework). Comparing {O final j } N e j =1 from AVO w to AVO c and AVO g , we see that they are statistically similar.

Discussion
In the seismic inversions done in this paper, the data variance was set to an absolute value of 5 · 10 −5 . We have performed seismic inversions with different absolute values for the data variance and similar results as shown in Section 4.3 were obtained.
The numerical results shown in this section are based on a large-scale CO 2 injection study where the goal was to inject as much CO 2 as possible without creating hazardous over-pressure that can lead to, e.g., fault reactivation and fracturing. We have also applied the sequential inversion strategy in a preliminary CO 2 injection study, where a relatively small amount of CO 2 was injected (not shown in this paper). Here, the benefit of the sequential, two-step inversion strategy over just performing seismic inversion was not so clear. Since the spatial resolution of CSEM and gravimetry is lower than that of seismic, and the sensitivities of these methods are dependent on the amount of CO 2 injected, there will be a point where the benefit of performing CSEM or gravity inversion prior to seismic inversion will be minimal.

Conclusions
A Bayesian sequential inversion strategy for joint utilization of CSEM or gravimetric data with seismic AVO data for monitoring purposes has been presented and applied to a test case based on simulation of large-7scale CO 2 injection in the Skade formation. The strategy consists of two steps: In step 1, we invert CSEM or gravity data to get an approximate location and shape of the CO 2 plume, and in step 2, the inversion result from step 1 is used in the construction of the prior model for seismic AVO inversion. The unknown geophysical model parameters-electric conductivity, density, and seismic velocity-are represented using a model-based parameterization. The parameterization is based on the level-set framework which allows for representation of region boundaries defined by the largescale CO 2 plume and slowly varying geophysical properties inside and outside the plume. By using parameter grids detached from the forward model grid, the model-based parameterization uses far less parameters in the inversion compared with equivalent methodologies using a pixelbased parameterization.
To solve the inverse problems considered in this paper, ensemble-based Bayesian methods are used. For CSEM and seismic inversions, we applied the EnKF, while ES-MDA was used for gravity inversion. Both these ensemble-based methods provide an approximate sample from the true posterior PDF at moderate computational cost.
Numerical results from step 1 of the inversion strategy showed that inversion of CSEM data provided a better approximation of the shape and location of the CO 2 plume than inversion of gravimetric data. Numerical results from step 2 of the inversion strategy showed, however, that the seismic velocity model was well identified using prior information from either CSEM or gravity inversion results. Numerical results from seismic AVO inversion without any prior information from CSEM or gravity inversion showed that the seismic velocity model was only partially recovered. Hence, utilizing CSEM or gravity data with seismic AVO data with the sequential inversion strategy improved the seismic inversion results significantly.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.

Appendix 1. Reduced, smoothed level-set representation
Recalling the notation introduced in Section 3.1, let denote a set of real-valued, continuous functions onthe level-set (LS) functions. Utilizing this set to construct j N c j =1 in a particular manner will render (6) a LS representation. With N c > 2, alternative LS representations (LSR)s exist [18,56,59,80] which are able to represent between N φ + 1 and 2 N φ subregions using N φ LS functions. For detailed expositions of the LSRs proposed by [80] and [59] in the context of modelling of geophysical exploration problems, we refer to [76] and [75], respectively. We will, however, only require the case where N c = 2, in which case the LSR is unique and only a single LS function, φ, is applied.
To arrive at the LS representation from (6) with N c = 2 inserted, we first replace the explicit dependence of χ 1 and χ 2 on x and a by an implicit dependence through the LS function, Next, we select 1 as the part of where φ (x; a) > 0. Since 2 = \ 1 , we obtain the LSR in standard notation, H (φ (x; a))) , where H denotes the Heaviside function (indicator function for the positive real axis). There are few restrictions on φ. Hence, the LSR is a very flexible way to represent subregions in , as illustrated in Fig. 17. The shapes of 1 and 2 are governed by the LS function, whose spatial variation is controlled by the parameters in a.
The LSR has been extended [18] to incorporate arbitrary spatial variation within each zone by replacing (28) with where c 1 ∈ R N c 1 , c 2 ∈ R N c 2 , and N c 1 + N c 2 = N c . Both (28) and (29) will be applied in numerical examples, where relevant quantities, such as N c 1 and N c 2 , will be specified. To complete the general description of the LSR, the dependency of φ on x and a must be specified. When applying (29), also the dependencies of k 1 on x and c 1 and k 2 on x and c 2 must be specified. We will apply the same type of representation for the LS function, φ, as for the coefficient functions, k 1 and k 2 .

A.1 Reduced parameterization of level-set and coefficient functions
Let ψ represent either of the functions φ, k 1 , or k 2 , and correspondingly, let b represent either a, c 1 , or c 2 . We express the dependency of ψ on x and b by [6] ψ ( The basis functions {ξ k } N b k=1 are defined on a rectangular parameter grid that is not attached to, and much coarser than, the forward model grid (Fig. 18a). Hence, N b N g , and our parameterization is therefore significantly reduced with respect to a pixel parameterization. There will, however, still be sufficient flexibility to approximately represent the large-scale structures that we aim to estimate.
While alternative representations are viable, we represent ψ in a finite-element fashion [6], and let ξ u be a normalized piecewise bilinear function with support on the four rectangular elements adjacent to node u (arbitrary) (Fig. 18b). Its value is unity in node u and zero in all other nodes. Figure 18c shows node u and three of its adjacent nodes, v, r, and s, and the supports of the basis functions associated with these four nodes. Figure 18d shows the element where ξ u , ξ v , ξ r , and ξ s have common support. The projections of ξ u , ξ v , ξ r , and ξ s onto this element are normalized bilinear functions, so whenever ψ is to be evaluated at a forward model grid point, its value is calculated using bilinear interpolation.

A.2 Smoothed level-set representation
We replace H in the LSR by a smoothed approximation, resulting in q (x; m) no longer being a zonation since H will have global support in . Introducing smoothness in q can be beneficial since the nonlinearity in the mapping b Support of ξ u (/). c Supports of ξ u (/); ξ v (|); ξ r (−), and ξ s (\). d Element where ξ u , ξ v , ξ r , and ξ s have common support a → q will decease with increasing smoothness [53]. This consideration should, however, be balanced by the desire to keep a relatively sharp transition between subregions where q (x; m) ≈ c 1 (q (x; m) ≈ k 1 (x; c) if (29) is applied) and subregions where q (x; m) ≈ c 2 (q (x; m) ≈ k 2 (x; c) if (29) is applied). The width of the transition region is decided by the behaviour of φ in the vicinity of its zerolevel set, ζ . Let n be a unit normal vector to ζ . A sharp transition in q over ζ then corresponds to large values of |∇φ · n|. Figure 19 illustrates the difference between a LSR and a smoothed approximation to a LSR when (28) is applied.
where z ∼ N (0, 1) and LL T = C m 0 , with L being a lower triangular matrix. Based on knowledge of the CO 2 plume, e.g. from previous time-lapse vintage data, suitable values form 0 = ((c 0 ) T , (ā 0 ) T ) T can be generated. To generate C m 0 , it is assumed that a and c are not correlated, and, moreover, it is assumed that c 1 is not correlated with c 2 . Hence, where C c 0 i and C a 0 denote covariance matrices for c i , i = 1, 2, and a, respectively. Note that if (28) is applied, the covariance matrix C c 0 i reduces to a scalar variance, β i . To generate C c 0 i and C a 0 , a spherical covariance function [14], is applied. Here, h denotes spatial distance between two nodes in the parameter grid (confer Section A.1), and α denotes the correlation length. The covariance matrix can thus be generated as where the subscript ' * ' denotes either a 0 or c 0 i which leads to ' † ' being either N a or N c i , respectively.
The covariance matrices C c 0 1 , C c 0 2 m and C a 0 can be non-diagonal, to allow for anisotropic correlations. The anisotropy will be specified trough the angle, γ , from the z-axis to the principal axis corresponding to the largest eigenvalue, and the anisotropy ratio, δ. Numerical values for α, β, γ , and δ will be given in Section 4.2.
For an in-depth description of the EnKF applied to a geophysical method (CSEM) and generation of the initial ensemble with the reduced, model-based representation, with examples, see [75].

Appendix 3. Sample mean and covariance matrix
Let Y = y 1 , y 2 , . . . , y N e denote an arbitrary ensemble matrix, and let u denote an N e vector where all entries equal unity. The sample (empirical) mean may then be written as Furthermore, let U = (u, u, . . . , u) (i.e. with N e columns), and define the sample mean matrix as Y = 1 N e YU.
The sample cross-covariance matrix between two arbitrary random vectors, y and z, is then given as The sample auto covariance matrix, C y , is given by (38) with Z = Y.