Methodological development of a probabilistic model for CO2 geological storage safety assessment

In the framework of CO2 capture and geological storage, risk analysis plays an important role, because it is an essential requirement of knowledge to make up a local, national and supranational definition and planning of carbon injection strategies. This is because each project is at risk of failure. Even from the early stages, it should take into consideration the possible causes of this risk and propose corrective methods along the process, i.e., managing risk. Proper risk management reduces the negative consequences arising from the project. The main method of reduction or neutralizing of risk is mainly the identification, measurement and evaluation of it, together with the development of decision rules. This report presents a methodology developed for risk analysis and the results of its application. The risk assessment requires determination of the random variables that will influence the functioning of the system. It is very difficult to set-up a probability distribution of a random variable in the classical sense (objective probability) when a particular event rarely occurred or even it has an incomplete development. In this situation, we have to determine the subjective probability, especially at an early stage of projects, when we have not enough information about the system. This subjective probability is constructed from assessment of expert judgement to estimate the possibility of certain random events could happen depending on geological features of the area of application. The proposed methodology is based on the application of Bayesian probabilistic networks to estimate the probability of risk of leakage. These probabilistic networks can define graphically the relations of dependence between the variables and joint probability function through a local factorization of probability functions.


Introduction
There is no human activity without risk. Accordingly, neither are the CO 2 capture and storage (CCS). In fact, this technology has a risk level similar to any other type of industrial activity and particularly those related to oil and gas industry, for which there are specific regulatory frameworks. With regard to the CO 2 geological storage (CGS), the problem is mainly reduced to provide satisfactory answers to the questions concerning whether the CO 2 may leak and what would be the consequences of such leaks, specifically with regard to the short-and long-term consequences for the safety, health and environment [1]. It is important to highlight the need to properly address these issues, among other reasons, for its influence on public acceptance of this technology, a key element for the large scale implementation of the CCS.
In the case of CGS projects, we have on one hand, the risks arising from the operation of surface facilities associated with the impacts on safety, health and the environment during the injection process. They are similar to those associated with any other type of project and its evaluation is a common practice in various industries. Methods are available for quantitative risk assessment that are directly applicable and tools that have been used in other industrial processes.
Since the estimations of the probabilities and consequences are based directly in experience, confidence in the assessment of those risks is to be high, but however usually not bias free [2,3]. However, in addition to the above risks, there are long-term ones associated directly with the release of CO 2 from the storage complex or due to induced movements, than can be reduced as local and global ones. The firsts are associated with effects on the environment or the health of the population. The latter is associated with the impacts of the release on climate change processes that are tried to prevent using this technology [4]. In all cases, there are economic consequences. In general, it is observed that the proposed methodologies for assessing long-term risks arising from the CGS are based on those that have been developed and fine-tuned for the past 20 or 30 years for the study of deep geological repositories (DGR) of high-level nuclear wastes.
Taking into consideration, the experience gained in the study, development and application of such methodologies by this working group, this paper presents the methodology developed for a probabilistic risk assessment (PRA) of a potential site for CGS. This methodology is based on predictive causal modeling, in turn based on a formalized abstraction process where knowledge construction and reasoning derived from this construction would be based, in turn, on previous information and virtual predictions made on this starting information, all implemented under the formalism of Bayesian networks (BNs). This generates a PRA process of mutual feedback between project progress and results of the risk assessment that allows gradual and continuous transition from qualitative data based models to quantitative ones. This can take on the whole CGS project, through a continuous process of PRA, from the initial stages of the project, characterized by a paucity of information, thanks to the adoption of a subjective perspective of the concept of probability [5] and the application of expert judgment (EJ). Without a doubt, these initial analyzes would not be exempt from biases and heuristics. But first, even with limitations, it is preferable to have some information when making decisions on the project. In the worst case, at least, provides a starting point on which to discuss. And secondly, this problem would be overcome gradually according to the progress on the available information and generation of physical/chemical models that would replace the qualitative estimates based on EJ [6][7][8].
This paper presents this new methodology based on the application of Bayesian probabilistic networks, as well as the results of its application at an early stage of a Spanish R&D project. This methodology estimates probabilistically the risk of leakage at a geological storage of CO 2 , a key concept of risk assessment. Bayesian networks can graphically define relations of dependence between variables and joint probability functions through a local factorization of probability functions to quantify potential impacts and uncertainties.
Risk assessment requires the determination of the random variables that will influence the evolution of the system, but it is difficult to set up a probability distribution of a random variable in the classical sense (objective probability) when a particular event rarely is going to occur or even when it has had an incomplete development. Hence, determining the subjective probability, especially at an early stage of projects, when there is not enough information about the system, will be a customary situation. This subjective probability is first constructed from expert judgment to estimate the possibility of certain random events to happen depending on specific geological features of the area of application. The Bayesian perspective allows us for a combination of quantitative probabilistic data from calculation models and/or databases with qualitative estimates of probability from expert judgment, so allowing for a continuous transition from initial qualitative models to final quantitative ones, as the knowledge of the system develops.

Methods
The developed methodologies for long-term risk assessments of CO 2 storage are essentially based on the study of storage capacity to hold CO 2 over time, and therefore try to determine the long-term behavior of the CO 2 initially injected into the formation. These methodologies use structured processes of systems analysis to organize and rationalize the process of defining scenarios and reduce the role of subjective judgment in determining them. The development of a wide range of risks and mechanisms underlying them, provides a good basis for a systematic assessment of the risks.
The proposed methodology constitutes a new methodological approach to solving the problems of risk assessment of the activities of CO 2 geological storage (CGS), based on the determination of probabilistic risk assessment (PRA) through Bayesian networks (BNs) and Monte Carlo probabilities [9]. A methodology based on BNs represents an attractive tool for the natural way to make connections between items, for its simplicity of maintenance and because it allows decision-making under conditions of uncertainty. Furthermore, such a methodology, given its conceptual power, allows the core activities in the risk assessment of any proposed CGS project, such as mathematical analysis (areas of maximum and minimum variation, stability zones, etc.), or sensitivity analysis to estimate both the impact of different variables on the uncertainties of the system, such as the level of uncertainty of different conceptual models, key issues for the treatment of uncertainty.
The fact that this BNs model, oriented towards the estimation of the probability of risk of leakage in the system, is developed at an early stage of the project, with a shortage of associated data, means that the information available for the model is mostly qualitative, mainly from expert judgment (EJ). However, this initial problem will be overcome gradually depending on the progress in increasing the available information and the physical/chemical modeling generation will be increasingly replacing those based on EJ [6][7][8].
The Bayesian perspective allows probabilistic combination of quantitative data from, for example, calculation models and/or databases, with qualitative estimates of probability calculable, for example, from a EJ. This allows the transition from initial qualitative models until the end of quantitative models, passing through intermediate stages of combination of both types of probability estimates.
The scope of the formalism of BNs for managing conditional probabilities is determined by the set of ''observables'' in each inference, i.e., the environment of each qualitative decision-making. This leads, in a sense, the notion of risk scenario. An event tree is established with its final node being the parameter/function to qualitatively determine, formed by the ''conditions'' in that parameter (Darcy permeability, ''K'', or intrinsic permeability, 'k', for example) or function (permeability/conductivity as transfer function) fits into the next high-order scenario: the permeability in an aquitard (caprock) or in an aquifer (storage formation) and these in the actual path of CO 2 in the CGS, for instance. And, so with all the parameters/functions/ processes. However, given that the aim is to infer probabilities, it is not a tree of events but of probabilities of events; that is, and for example, the probability that the permeability 'K' takes values larger than a set value, ''x'', i.e., P(K) [ x, rather than its unknown actual value. Surely, throughout the project, its real value will be better known. The project will be evolving both in decision-making and in the improvement of the characterization of the storage complex, and both activities are interrelated and affect each other. Therefore, the PRA system must be ''dynamic'' and ''historical'' for it to be modified as the level of information improves, the evaluation improves, or new conceptual models of the storage behavior or its subsystems and/or components be accepted.
The methodology proposed consists in the application of Bayesian networks (BNs) to calculate the probability of risk of leakage in a geological storage of CO 2 . To carry out this task, the system was first conceptualized from the point of view of the risk of leakage and the characteristics that favors/prevent mitigation. This leads to a partitioning of the system into three subsystems: 1. Primary subsystem, which measures the potential of the target formation for the long-term containment of CO 2 . 2. Secondary subsystem, which measures the potential for containment by other formations present in the storage complex in the event that the target formation were not able to ensure CO 2 containment. 3. Tertiary subsystem, which measures the potential of the site to attenuate or to disperse leakage of CO 2 if the primary formation leaks and the secondary subsystem is not able to retain the release of CO 2 and it reaches the soil and atmosphere.
The model considers and establishes relationships between variables and attributes that describe each of the indicated subsystems ( Owing to the lack of data in the initial stages of the project, the model will require the use of qualitative values for estimating the probability of leakage risk. However, these assessments will be expressed numerically [13]. This is due, first, because there are significant variabilities and overlaps when verbal expressions are used.
In addition, there is the fact that the interpretations of these verbal expressions of probability vary in different contexts even though they can be kept constant in each evaluator [14][15][16].
Two values will be assigned to each variable or attribute of the system, one based on the judgment of how this variable behaves with regard to the probability of leakage. The second should assess the certainty provided by the source that has underpinned the previous value. In practice, this means that each variable has two values that define the range in which likely the true value of this variable can be placed. Finally, two models are obtained to define the upper and lower ranges of the leakage probability from the storage system.
For the assignment of the pairs of values to each element (qualitative probability-associated certainty) a discrete coding based on the Pedigree schema has been chosen. This scheme aims to ''assess the reliability of the information, relying on users' purposes'' [17], examining, at the same time, the process of production of such information. For the assignment of the pairs of values to each element (qualitative probability, certainty), given the current level of a qualitative study, we have chosen a discrete coding, based on Pedigree scheme [18]. The aim of this scheme is to evaluate the reliability of information according to different purposes of use [19] examining at the same time, the production process of such information. For this, Pedigree scheme is expressed by a set of criteria, for example, empirical basis or the degree of validation, many of which have a high difficulty objective measurement. To minimize arbitrariness and subjectivity in the measurement of validity, qualitative expert judgments for each criterion are encoded into a discrete numerical scale of ''0'' (weak) to ''4'' (strong) with linguistic descriptions (modes) of each level on the scale [20] (see Table 2).
In the case of the proposed methodology, the same criteria have been applied and the qualitative probability has been coded on a scale from ''0'' (leakage probability equal to ''1'') to ''4'' (leakage probability equal to ''0'') and the degree of certainty on a scale from ''0'' (assumption weakly based on objective data) to ''2'' (existence of reliable measurement data). This is because with the information available, it was considered that the inclusion of more levels only brings more subjectivity to the study.
The model evaluates the combined probability of leakage from storage subsystems (or primary) and saturated medium in the storage complex located above the caprock secondary) as well as the leakage attenuation capacity of the surface (soil, unsaturated water and the lower atmosphere). In this study, it has been preferred to decouple the tertiary subsystem of the other subsystems due to profound lack of information it currently has. However, its inclusion is important because it allows in incorporating the level of uncertainty that introduces into the global model. The global model is represented in Figs. 1, 2, 3, 4 show details of its organization in partial subsystems. The model is composed of the variables listed in Table 1.
It should be noted, finally, that the estimated values of leakage risk probability obtained with qualitative variables  Modified from [28,29] are not a value of probability itself, but a factor of this (arbitrary unit), which is called qualitative probability, so that it satisfies the following expression: where P(x) is the probability value, K c is a scaling constant and P c (x) is the qualitative probability value. The scaling constant K c is obtained through the model validation against experimental data calibration. Although this factor may not be known, it is possible to compare between multiple models based on qualitative data normalized by the fundamental axioms of probability. If they were a combination of quantitative and qualitative variables, it would be necessary to obtain an estimate of the scale factor. This would be done by applying quantitative calculation models so that the estimation system of the leakage probability could operate with both types of estimates.

Results and discussion
This methodology has been applied to the study area of Huérmeces (Burgos, Spain) (see Fig. 5). The study site is basically a small 1200 m deep dome (Figs. 6, 7) with high gradients of its geological features which facilitate the development of the experiments both temporarily and economically. The application of the proposed methodology is embodied in the application of the RR.BB model to the  area of Huérmeces, constructed to estimate the risk of leakage. The storage formation which meets geological maximum requirements for storing CO 2 in a supercritical phase in the area would be the Huérmeces clastic unit called Unidad Clástica del Lías (Lias Clastic Unit) which appears under the loamy Middle Lias seal formation and above the basal anhydritic Lias (Carniolas) seal formation and the Triassic Keuper saline materials. This Unit is composed of limestones interbedded with dolomitic limestones and dolomites. At its upper part 21 m of sandy limestones are described that characterize this Unit, which was the main objective of exploration boreholes conducted in this area for hydrocarbon reserves evaluation. The most important structure analyzed in the IGME Montorio geological map sheet during hydrocarbon exploration surveys is associated with the called Falla de Hontomín, beneath the subsoil of the same name' town. This structure has been interpreted through seismic studies as a faulted block inclined against a graben structure. That structure occupies a 32-km 2 area, with a vertical structural closure of 475 m [21]. Inside, the Lías Clastic Unit appears at 1,582, 1,353, and 1,238 m depth in Hontomín-1, Hontomín-2 and Hontomín-3 boreholes, with 114, 92 and 62 m thickness, respectively, the last one being affected by a fault or fracture zone [22][23][24].
This Unit also appears in other structures analyzed in the IGME Montorio geological map sheet, which means that it presents a regional lateral continuity and, so, much more storage capacity than that associated to the Falla de Hontomin structure.
Seal material above this Unit is constituted by a lithologic package with a predominance of marls and clayed limestones. It also seems having regional lateral continuity with thicknesses of 40, 115, and 94 m at Hontomín-1, Hontomín-2, and Hontomín-3 boreholes, respectively, the first of which is affected by a fault [25]. Beneath this seal material, in the Lías Clastic Unit, hydrocarbon traces have been found in boreholes Hontomín-1 and Hontomín-3, and a total accumulated production of 2,939 oil barrels have been withdrawn from Hontomín-2 borehole, which indicates the effectiveness of this material as a seal.
This site was previously assessed through a recognized methodology for selection and classification of formations based on the analysis of the HSE risks arising from CO 2 leaks, allowing us to compare the two methodologies [26,27]. This comparison acts as a first validation of the method, as far as the experimental data that allow us the direct validation are available. The results indicated that both are consistent and that the quality of the study site is rated as intermediate to good for CO 2 storage (see Fig. 8). Nevertheless, the present methodology allows us to go beyond as we also can obtain qualitative probability functions of subsystems and global risk qualitative probability function.
The application of the proposed methodology to Huérmeces area is materialized in the construction and estimation of the model in Fig. 1, based on the theoretical framework of the BNs and implemented in GoldSim, a Monte Carlo simulation software solution for dynamically modeling complex systems.
The qualitative probability density function for the complete system is presented in Fig. 8, in which the confidence ranges are included. The methodology also allows us to perform sensitivity analyses. The comparison of the uncertainty contribution of the input parameters to the results uncertainty lets us to establish the relative importance of the variables, and therefore the subsystems, in providing total system uncertainty. The developed simplified results of the sensitivity analysis are reflected in Fig. 9, where only subsystems one and two are taken into consideration.
When comparing the relative importance of the contribution of the variables of the different subsystems to the total system uncertainty permits us to conclude the clear dominance of the secondary subsystem. That means that, in order to achieve a significant reduction in total system uncertainty, a way of improving the knowledge of these parameters must be addressed. Therefore, the methodology allows us to make decisions about where future studies of the system should be directed.
In this case, it is necessary to make a greater effort aimed at improving the characterization of subsystem two to improve the knowledge about the system.
The model also provides a platform for the progressive integration of the data being obtained as a result of the progressive characterization of the system. This is essential to make the transition from a qualitative model to a quantitative one through the progressive integration of quantitative data in numerical and/or analytical models.

Conclusions
The proposed methodology represents a new approach to solve the problems derived from a risk assessment of geological storage of CO 2 within a framework of safety and protection of health and the environment. The development of models based on Bayesian networks for the description of these systems has the disadvantage of not being an easy task. However, represents an attractive tool because it has in its favor the natural way to make connections between items, the simplicity of maintenance and because it lets us decision making under conditions of uncertainty. Furthermore, this methodology, given its conceptual development, allows us activities in the risk assessment of any proposed geological storage of CO 2 , such as mathematical analysis (areas of maximum and minimum variation, stability zones, etc.), or sensitivity analysis to estimate both the impact of different variables on the uncertainties of the system, such as the level of uncertainty of different conceptual models, key issues for the treatment of such uncertainties. From the development and application of this methodology it can be concluded that allows us to assess the probability of CO 2 leakage risk of potential areas or sites for the geological storage of CO 2 with just a partial Fig. 8 Results of the application of the methodology Fig. 9 Contribution of the variables to the system uncertainty knowledge of their characteristics, based on qualitative data and the estimation of qualitative probability. The results of the application of this methodology to the site of Hontomín allow us to conclude that this is classifiable as medium level leakage risk with a medium-high level of associated uncertainty. The performed sensitivity study indicates the need for a better characterization of secondary subsystem to substantially reduce the level of uncertainty.
The application of the proposed methodology and the Selection and Classification of Formations methodology (SCF), internationally recognized, in the Hontomin area, and the subsequent comparison of results led to the conclusion that both agree on the classification of the study area. However, the SCF methodology ends in the qualitative assessment. In contrast, this qualitative assessment means the starting point for the current proposed quantitative methodology, because it allows us to move progressively towards obtaining a pure quantitative model, based on the relationships between the variables set by the Bayesian network model and the gradual incorporation of new quantitative data.