Accelerated reactive transport simulations in heterogeneous porous media using Reaktoro and Firedrake

This work investigates the performance of the on-demand machine learning (ODML) algorithm introduced in Leal et al. (Transp. Porous Media133(2), 161–204, 2020) when applied to different reactive transport problems in heterogeneous porous media. This approach was devised to accelerate the computationally expensive geochemical reaction calculations in reactive transport simulations. We demonstrate that even with a strong heterogeneity present, the ODML algorithm speeds up these calculations by one to three orders of magnitude. Such acceleration, in turn, significantly advances the entire reactive transport simulation. The performed numerical experiments are enabled by the novel coupling of two open-source software packages: Reaktoro (Leal 2015) and Firedrake (Rathgeber et al. ACM Trans. Math. Softw.43(3), 2016). The first library provides the most recent version of the ODML approach for the chemical equilibrium calculations, whereas, the second framework includes the newly implemented conservative Discontinuous Galerkin finite element scheme for the Darcy problem, i.e., the Stabilized Dual Hybrid Mixed(SDHM) method Núñez et al. (Int. J. Model. Simul. Petroleum Industry, 6, 2012).


Introduction
The challenge in coupling physical and chemical processes lies not only in deriving the working mathematical model but also in solving it numerically. There is an excessive computational load, most of which often results from the geochemical reaction calculations. At the same time, the importance of modeling such coupled processes has significantly increased over the recent years. They have become essential for interpreting the phenomena occurring in the surface or subsurface systems, as well as engineering and environmental problems. Real-life applications include a broad diversity of reactive chemical processes. Among many are rock/mineral alteration in natural diagenetic systems as a result of the carbon capture and storage (CCS), geological sequestration, and chemical stimulation of geothermal reservoirs, injection of acid gases resulting in groundwater contamination, and enhanced oil and gas recovery, transport, and storage of radiogenic and toxic waste products in geological formations (see [91,93,102] and references therein). The alterations in fluid and rock composition cause changes in the rock porosity and permeability, fluid density and viscosity, all of these ultimately affecting how fluids, heat, and chemical species are transported.
Depending on the formulated reactive transport model, chemical reaction calculations (i.e., chemical equilibrium and kinetics) can be responsible for most of the computing costs in the simulation. Achieving a more balanced cost distribution remains a demanding area of research, given the nature of the computations for the complex chemical system. Simulation of the physical processes typically leads to systems of linear equations that follow from the discretized partial differential equations (PDEs) governing physical laws. To simulate chemical processes, in turn, one needs to compute thermodynamic properties of a substantial number of species employing complex equations of state and calculate chemical equilibrium/kinetic states by solving a system of non-linear differential-algebraic equations. These operations must be repeated in all discretization cells of the computational domain, at each reactive transport step.
Even though the ODML method contrasts with classical machine learning methods, we acknowledge the body of the literature that reports efforts to apply machine learning to the simulations of porous media. This challenging application has been investigated in the different contexts, i.e., creating high-resolution data from low-resolution images, micro-macro modeling, correlation of the hydraulic conductivity field and the longitudinal macro-dispersivity [108], two-phase upscaling computations, characterization process for the flow simulations in underground fractures [8], particle transport and filtration in porous media [67], and well placement for geological carbon storage. Machine learning methods have significantly penetrated the subject of the uncertainty propagation for multiphase flow, i.e., from the point of view of data-driven approaches based on the labeled data [71,96,109] and by the so-called physics-informed neural networks (PINN) methods [27,84,110]. We also acknowledge [95] with the references therein for a systematic review of the recent development and applications of domain-aware scientific machine learning in geosystem research. When it comes to solving multiphase flow in heterogeneous porous media, [104] introduces a physics-constrained deep learning model to learn such modeling using a customized U-Net and a surrogate model for predicting well's flow rate. The ML-based approach performs prediction with a speedup of about 1400 times compared to physics-based simulations. Finally, in [100], the so-called CCSNet was proposed as an alternative to conventional numerical simulators for CCS problems. The authors report that the CCSNet produces all the outputs that a conventional simulator typically provides but about 100 times faster.
Despite that, in all the aforementioned literature, machine learning applications for geochemical speciation and equilibrium calculations are still incipient. All the proposed techniques rely on the training stages with data set generation and configuration and aim at accelerating flow and reactive transport simulations as one. Since the chemical equilibrium is computationally intensive, machine learning methods (aiming to accelerate speciation and equilibrium calculations for reactive flow problems) can provide a substantial improvement to decrease the demanded computational time. In [58], the ODML method is proposed to fill this gap. In contrast with all the proposed techniques employed in cited references above, the ODML algorithm accelerates the geochemical calculations only. It does not require training in advance but rather learns on the fly throughout the simulation, achieving considerable speedups using learned calculations and first-order Taylor approximations.
In this work, we present a novel coupling of the packages Reaktoro [51] and Firedrake [85] for modeling reactive transport processes with the on-demand machine learning (ODML) acceleration strategy [58]. This strategy can substantially speed up the geochemical calculations in reactive transport simulations by orders of magnitude. ODML is able to learn essential chemical equilibrium states during the simulation so that subsequent similar equilibrium states can be calculated considerably faster. The on-demand learning is triggered only when the previously learned equilibrium calculations are unable to predict the result of a new calculation. By detecting key chemical equilibrium calculations, the ODML can quickly predict the vast majority of equilibrium states in a reactive transport simulation instead of fully computing them. However, many questions related to the ODML method's extension to more challenging problems remained out of the scope of [58]. This paper provides a detailed study of the ODML capabilities to substantially accelerate reactive transport simulations with complex chemical systems and/or geologic features, i.e., two-dimensional (2D) porous media with heterogeneity. In particular, due to heterogeneous permeability fields, the presented examples include a substantial change in the chemical composition through the domain, making the learning of the ODML more demanding throughout the simulation. We also note that this paper only studies the time complexity of the ODML algorithm (and its decrease), not the spatial complexity (storage and memory).
Note Comprehensive evaluations conducted during the learning stage will also be referred to as conventional or full evaluations of chemical equilibrium states throughout the paper. These operations can be performed by either of conventional algorithms, i.e., the Gibbs energy minimization (GEM) approach or the law of mass action (LMA) equations [90]. In contrast, fast approximations will be alternatively referred to as smart estimations/predictions and employ only matrix-vector multiplication.
Note The on-demand machine learning (ODML) strategy applied in this paper does not rely on deep learning algorithms. The ODML efficiently stores the results of previous key equilibrium reaction calculations (as well as corresponding sensitivity matrix computed by automatic differentiation) and then employs these collected results for predictions, avoiding thus similar expensive computations at other spatial points and time steps. Thus, the ODML does not use neural networks (e.g., convolutional neural network) nor surrogate models. The ODML exploits, however, some machine learning strategies such as knearest-neighbors (kNN) and clustering algorithms, which can be viewed as non-supervised learning schemes. The on-demand clustering strategy within ODML classifies the learned calculations based on essential characteristics of the obtained chemical states (e.g., the major/primary species at that state). For more details, we refer the reader to the original work of [58] on the ODML algorithm.
Reaktoro is a computational framework developed in C++ and Python for modeling chemically reactive processes governed by either chemical equilibrium or/and kinetics. For chemical equilibrium calculations, Reaktoro implements numerical methods based on the GEM [52,54,57] or the extended LMA (xLMA) formulation [55]. For chemical kinetics calculations, assuming partial chemical equilibrium considerations, Reaktoro relies on the approach presented in [53]. It adopts an implicit time integration scheme for enhanced stability combined with an adaptive time-stepping strategy for efficient simulation of chemical kinetics. The on-demand machine learning (ODML) algorithm for faster chemical equilibrium calculations was introduced recently in Reaktoro [58].
Firedrake is an open-source library for solving PDEs with finite element methods (FEM). It is employed here to solve the equations that govern (solute/heat) transport processes (advection/diffusion equations) and fluid flow (i.e., the Darcy equation). The package uses a high-level Unified Form Language (UFL) embedded in Python [2], which provides symbolic representations of variational problems corresponding to PDEs that govern physical laws.
Note: It is worth remarking that a similar software coupling is relatively straightforward with another open-source computing platform for solving PDEs, e.g., the FEniCS Project [62,63]. Examples using such coupling can be found in [22]. Besides Firedrake and the FEniCS Project, Reaktoro has also been coupled with OpenFOAM [42] to produce the pore-scale reactive transport simulator poroReact [79].
This study has the following structure. Section 2 introduces physical and chemical processes considered for the reactive transport simulations and corresponding governing equations. Section 3 provides an overview of the numerical methods used in Reaktoro and Firedrake. Section 4 describes two reactive transport simulations conducted with Reaktoro and Firedrake and discusses the numerical performance of the ODML approach when applying it to 2D heterogeneous problems. In Section 5, we summarize the obtained results, draw conclusions, and discuss future work.

Governing equations
This section presents the governing equations for the physical and chemical processes computationally simulated in Section 4. Due to the complexity of each subprocess, our presentation is organized into three parts: • Section 2.1: single-phase fluid flow in porous media; • Section 2.2: reactive transport of the fluid species; • Section 2.3: chemical reactions among fluid species and rock minerals.

Single-phase fluid flow in porous media
We consider a relatively simple model for fluid flow for the purpose of assessing the potential of the ODML algorithm to accelerate chemical equilibrium calculations in reactive transport simulations. This is because our interest in this study is to evaluate how ODML behaves considering more complex chemical systems and heterogeneous porous media. In what follows, we assume the fluid is incompressible, gravity effects are negligible, and the porous medium is isotropic and nondeformable. Given these assumptions, we solve the coupled continuity equation and the Darcy equation below to compute the fluid pressure P and fluid Darcy velocity u throughout the medium: Here, and μ are the fluid density and the dynamic viscosity, f corresponds to the fluid injection/production rate, κ = κ(x), x ∈ , represents the (isotropic) permeability field of the porous rock, denotes the physical domain, and t final indicates the final simulation time.

Reactive transport of the fluid species
Mass conservation applies to the chemical species in the fluid as they advect, diffuse, disperse, and react with rock minerals in a porous medium. We apply the mathematical formulation presented in Appendix 4 of [58] for the reactive transport of fluid species in terms of mass conservation of chemical elements and electric charge. This approach is a standard procedure in the literature that substantially reduces the number of PDEs to be solved for transport phenomena [60]. The formulation also accounts for the dissolution and precipitation of rock minerals. For convenience, we write below the governing equation: where b f i and b s i are the bulk concentrations (mol/m 3 ) of elements (chemical elements and charge) in the fluid and the solid partitions, respectively. Here, v is the mean percolation velocity (in each cell) related to the Darcy velocity through the porosity φ, i.e., v = u/φ, and D is the dispersion-diffusion tensor [82], i.e., where α mol is the molecular diffusion coefficient as well as α l and α t are the longitudinal and the transversal dispersion coefficients, respectively.

Note:
We assume α l and α t be both zero so that D ≡ α mol , which suffices for the numerical investigations of the ODML approach performance in Section 4. Equation 3 possesses two advantages when compared to the transport equation formulated for chemical species: • Absence of the reaction term in the convectiondiffusion equation, leaving such concerns to a separate chemical kinetic/equilibrium solver and easing the coupling procedure.
• A considerable decrease in the number of unknowns, as the number of elements E is typically considerably smaller than the number of species N (see Eq. 3).
Note: For more than one fluid phase (each with its velocity field) and different diffusion coefficients for the fluid species, this simplified transport formulation becomes less straightforward.

Chemical reactions among fluid species and rock minerals
In our simulations, homogeneous and heterogeneous chemical reactions among the species are considered. We adopt a local chemical equilibrium model so that both fluid species and rock minerals are in chemical equilibrium at any point in space and time. Because of transport processes and variations in temperature/pressure (when applicable), the chemical equilibrium states are continually altered at each point of the domain. For example, a rock mineral may gradually dissolve as the more acidic fluid passes through that point in space. In this manner, at every discretized point in space, we solve the Gibbs energy minimization problem: to compute the chemical equilibrium amounts n = (n 1 , . . . , n N ) of the species distributed among all fluid and solid phases in the chemical system. Vector n includes the aqueous species amounts (solute and solvent water) and the mineral amounts composing the porous rock. Note: The given input of this chemical equilibrium problem is composed of the temperature T , pressure P , and the amounts of chemical elements b = (b 1 , . . . , b E ) (including electric charge), whereas the constraints include the mass balance An = b and non-negativity of the species amounts n ≥ 0. In (5), T is assumed to be constant throughout the medium, P computed via the solution of the continuity and Darcy equations, and b is updated over time via (3). For more information on solving the GEM problem, we refer to [57,58].

Numerical methods
In what follows, we consider the numerical methods required to solve the governing equations presented earlier.
To solve the fluid flow through a heterogeneous porous medium, we use a highly mass conservative finite element method (FEM) to accurately capture the velocity field. The transport equations are solved with a suitable FEM that handles advection-dominated flow. For the multiphase chemical equilibrium calculations, involving the fluid and solid species, we employ the Gibbs energy minimization algorithm (GEM) [54,57]. In addition to these numerical methods, we provide a brief review of the on-demand machine learning algorithm (ODML) presented in [58]. It is applied to accelerate several millions of time-consuming equilibrium calculations in the reactive transport simulation course. This total number of calculations results from the necessity to compute equilibrium states at each mesh point (or degree of freedom (DOF) in the finite element naming convention) during each time step.

Time staggered operator splitting steps
To solve the time-dependent reactive transport equations in Eq. 3, we apply the fully discrete formulation typically resulting from a combination of the finite difference approximation in time with the finite element approach in space. We use index k to denote the time-step and t = t k+1 − t k to determine the time-step size. We assume the uniform discretization where t final > 0 is the total time. The operator splitting scheme using classical sequential non-iterative approach (SNIA) for solving these transport equations is described below: We assume the flux boundary condition on the inlet face of boundary inlet ⊂ ∂ , where we inject the brine, The right boundary is considered a free outflow boundary. Vector n denotes the outward-pointing normal to the boundary face (n inlet is the normal on inlet ), andb i, inlet (in mol/m 3 fluid ) represents the given concentration of the injected brine. As a space discretization solver for Eq. 6, we employ the Streamline-Upwind Petrov-Galerkin (SUPG) scheme introduced in [13] (see Section 3.3) to handle advection-dominated transport in a accurate and stable way.
The velocity in Eq. 6 is generated from the coupling to the Darcy problem, i.e., u = u k , where u k satisfies the system A comprehensive numerical analysis, demonstrating the existence and uniqueness of the solution for the prior semi-discrete system, can be found in [65,66]. II. Evaluate the total element concentrations b i , using prior reconstructedb f, k+1 In each space discretization cell, calculate n k+1 for given T, P, and b k+1 using the smart chemical equilibrium algorithm accelerated with the ODML strategy (see Section 3.4).
To make sure that the Courant-Friedrichs-Lewy (CFL) condition is satisfied, we assume CFL = 0.3 and the time step is defined by where , and x and y are the lengths of the cells along the x and y coordinates, respectively.
Note: As it is mentioned above, we use the sequential noniterative approach (SNIA) to couple transport and chemical codes. Carrayrou et al. [14] provides the comparison of different coupling schemes for a reactive transport benchmark, where the SNIA proves to be an efficient scheme with a reasonable error magnitude. According to [16], the SNIA approach works well for problems with advection and has a good balance between accuracy, computational time, and transparency of implementation.
These results provide us with the confidence to use the SNIA in more realistic reservoir simulation cases. Future works on extending Reaktoro and Firedrake coupling will use two-way coupling to ensure the proper handling of the mass conservation. It would be interesting to study the transport and chemical equilibrium coupling performance for the famous benchmarks. However, it lies beyond the scope of Reaktoro package, which focuses mainly on modeling realistic chemical systems with corresponding thermodynamical databases.

SDHM method for the fluid flow in porous media
In the following, we describe in short the relevant part of the finite element method (FEM) applied in the present work. We use a conservative FEM to obtain accurate velocity fields that satisfy mass conservation, a critical numerical feature for transport problems in heterogeneous media. The formulation is based on stabilized mixed finite element methods [12,21,68] combined with hybridization techniques [18,19]. The resulting Stabilized Dual Hybrid Mixed (SDHM) method [76] has all Discontinuous Galerkin (DG) desirable features while requiring fewer degrees of freedom (DOF) due to the static condensation procedure. The discretized global system is solved for Lagrange multipliers only (defined on the mesh skeleton). The solution for pressure and velocity fields, in turn, is recovered by the element-wise post-processing of these multipliers' solution.
To present the SDHM formulation used in this work, we introduce the classical L 2 inner products for an arbitrary regular domain ⊂ R n (n = 2, 3) as: in which p and q represent scalar functions and z and w vector-valued functions defined on .
To derive the weak formulation of the Darcy problem (2) in the heterogeneous medium, we follow the lines of [78].
denote a subset of interior edges. Thus, we obtain the following weak form of Eq. 2: stand for the local functional spaces, n K denotes the outward pointing normal to ∂K , andp is the trace of pressure on skeleton. We assume that κ is at least invertible element-wise.
Then, the week formulation on each K ∈ T h reads as follows: Utilizing the notation and formulation from [4], to obtain the pressure trace we solve the global dual hybrid mixed problem: find u ∈ W : in which D ⊂ ∂ and N ⊂ ∂ ( D N = ∂ and D N = ∅) are the sub-domains for pressure and flux boundary conditions, respectively.
In Eq. 9, each Lagrange multiplier λ corresponds to the pressure trace on K ∈ T h satisfying the decomposition p = λ + p D , i.e., λ is solved for internal edges and p D is prescribed on boundary edges [19,75]. The third equation of Eq. 9, known as transmission condition [19] or flux continuity of the Darcy velocity field normal component. Note that flux boundary conditions are also weakly imposed through the transmission condition.
To stabilize (9) and obtain (locally) adjoint consistent formulation, we add respective stabilization terms, i.e., These least square residual forms correspond the mass balance, the Darcy, and the curl of Darcy's laws [20] together with the Lagrange multiplier introduced in [5].
Then, the SDHM formulation reads as: find u ∈ W := where ∇× is the curl operator and β := β 0 h ≥ 0 is the stabilization factor with β 0 ∈ R.
The first and second equations in Eq. 10 generate the two local problems and the third the global problem. Subsequently, we can approximate u, p, and λ by approximations from the broken function spaces, i.e., Lagrangian finite element spaces. Here, P k is the polynomial set with m < k if K is a triangular element, or m = k if e represents an edge, or m ≤ k in each Cartesian variable if K is represented by a quadrilateral element (here, k = l, m, or s). The SDHM method is known with its consistency and flexibility in the approximation spaces choice. It also provides optimal rates of error convergence [76], stable for any value β, including β = 0 [77], and is locally conservative for l = m = s. The multiplier choice as the pressure trace is essential to ensure that Eq. 10 is solvable for (u h , p h ) ∈ W m h ×Q l h as functions of λ h ∈ M s h . Moreover, it results in a computationally efficient way to solve a global system with only the trace variable as unknown due to static condensation procedure.

SUPG method for the semi-discrete element-based transport problem
To obtain the approximation of the semi-discrete transport problem, we use the Streamline Upwind Petrov-Galerkin (SUPG) method. Usually, it is applied to advectiondominated partial differential equations to suppress numerical oscillations present in the classical Petrov-Galerkin method for this class of a problem [13]. Discretizing (3) in time yields the implicit time-stepping scheme: To concentrate on the SUPG approximation scheme, we omit the index f indicating the amount of elements in the fluid species and the index j = 1, . . . , E, numbering the elements. Then, the fully discrete approximation of Eq. 11 reads as: find b k+1 ∈ X p h satisfying the bilinear form We omit the sub-index of domain in some of the L 2 inner-products, i.e., (u, v) ≡ (u, v), and assume the approximation space X p h to be a continuous Lagrangian finite element space of degree p ≥ 1. The stabilizing term S(b k+1 h , η) to the standard Petrov-Galerkin variational formulation is defined as with the stabilization parameter

It is dependent of the Peclet number Pe
, where c inv stands for the inverse constant of finite element spaces.

ODML method for faster chemical equilibrium calculations
Finally, we briefly describe the smart chemical equilibrium calculations method introduced in [58], which combines a classical algorithm for chemical equilibrium with an ondemand machine learning (ODML) strategy that accelerates the calculations by one to three orders of magnitude. Let the process of solving (5) (i.e., the problem of computing a chemical equilibrium state) be represented in the abstract functional notation: where f represents the function that performs the necessary steps towards a solution, and x and y are the given and result vectors defined as Here, x comprises given triple (T , P , b) introduced in Section 2.3. Besides n, the output vector y also contains the chemical potentials μ = (μ 1 , . . . , μ N ) of the species. Put differently, y contains the information on the final speciation of the chemical system and thermochemical properties at that final state. Assume that f was evaluated previously with input conditionsx, and (x,ẙ,ẙ x ) is stored for the future use, where y = f (x) andẙ x := ∂f/∂x denote the values and Jacobian (or sensitivity) matrix of f . Such sensitivities can be utilized to predict new chemical equilibrium states quickly and relatively accurately in the vicinity of (x,ẙ,ẙ x ). In particular, we use a first-order Taylor extrapolatioñ whereỹ serves as an estimate of y. In another words, we can useẙ x to estimate how the species amounts in the final state would change when small perturbations are applied to T , P , and b. Note: Computing this sensitivity matrix efficiently and accurately is far from trivial, and we can accomplish this via the use of automatic differentiation [56].
Once the predicted outputỹ is calculated, it must be verified for acceptance. For this, we introduce a function g(ẙ,ỹ) < ε that assess whetherỹ provide a sufficiently accurate approximation of the exact output y = f (x) with a preselected tolerance ε > 0. The acceptance test function g(ẙ,ỹ) (which can vary across applications of the ODML algorithm) resolves to either an acceptance or rejection and does not require the evaluation of a computationally expensive function f . For more details on the acceptance criterion and how the reference elements are stored in priority-based clusters, we refer the reader to [58].

Results
This section investigates the performance of the on-demand machine learning (ODML) strategy when applied to accelerate relatively complex reactive transport simulations. For a more in-depth method description, we refer to [58] and its implementation in the open-source software Reaktoro [51]. This section aims to demonstrate that the ODML enables accelerated reactive transport simulations without reducing accuracy. It also demonstrates the intrinsic capability of ODML in conserving electric charge and mass of chemical elements.
Allan et al. [58] demonstrates the algorithmic and computing features of the ODML method using relatively simple one-dimensional reactive transport simulations. In this work, we consider two reactive transport problems with more complex chemical and geologic conditions in the simulations. The first problem models the dolomitization phenomenon in a rock column, similar to the one discussed in [58]. We undertake this test deliberately to enable comparing the numerical results and obtained computation speedups. The second problem addresses the H 2 S-scavenging of a siderite-bearing reservoir, particularly essential for the oil and gas industry. Activity models and thermodynamic data. To evaluate activity coefficients of aqueous species, we use the Pitzer [32,83], the Debye-Hückel (DH) [24], and the Helgeson-Kirkham-Flowers (HKF) [35][36][37][38] activity models. The activity of solute species CO 2 (aq) is computed using the model of Drummond [25]. To calculate the standard chemical potentials of the species, we utilize the equations of state derived in [34,35,87,97], and [88]. Finally, the density of water (with temperature and pressure derivatives) is computed with the [99] model. Two database files are used to obtain corresponding parameters for thermodynamic calculations, i.e., model parameters for evaluation of the species standard thermodynamic properties for temperatures 0-1000 • C and pressures 1-5000 bar. In particular, for the dolomitization modeling discussed in Section 4.1, we use the supcrt98.xml database of Reaktoro, whereas for the scavenging example in Section 4.2, the supcrt07.xml database is utilized. Both files were generated from the SUPCRT92 [43] database files slop98.dat and slop07.dat, respectively.

Numerical methods and other setup details.
For the numerical investigations presented further in this section, the Darcy velocity and the pressure approximations in Eq. 7 are calculated employing the SDHM method [76]. Instead of updating them at each simulation step, the pair (u, p) is reconstructed only once at the beginning of the reactive transport simulation due to insignificant porosity changes. Furthermore, this work's primal goal is to evaluate the ODML algorithm performance under more challenging conditions. We assume zero source-term in Eq. 7, whereas = 1000.0 kg/m 3 and μ = 8.9 · 10 −4 Pa · s. For the pressure, p inlet = P inlet and p outlet = 0.9P inlet on the left and right boundary of the rock, respectively. The rock's heterogeneous permeability (with preferential flow path) was obtained using the open-source Python package GeoStatTools [72].

Case I: Reactive transport of dolomitization process
Numerical model setup. The first heterogeneous reactive transport model is illustrated in Fig. 1. The vertical and horizontal lengths of the rock are set to 1.6 m and 1.0 m, respectively. By discretizing both dimensions with 100 cells (or 101 FEM nodes), we obtain 10201 degrees of freedom (DOFs). At each DOF, we monitor the system's entire chemical state, i.e., its temperature, pressure, bulk concentrations of the species, and thermochemical properties (species activities, phase densities, phase enthalpies, etc).
To make this numerical example readable and reproducible, we recapitulate all the parameters in Table 1. The initial state is represented by the rock plate having initial 10% porosity and composed with 98% vol of quartz (SiO 2 ) and 2% vol of calcite (CaCO 3 ). The resident brine comprises the NaCl brine (0.70 molal) that remains in equilibrium with calcite and quartz. The resulting pH of the initial state is 9.2. The boundary condition is defined by the NaCl-MgC-CaCl 2 -CO 2 brine of pH=3.1 (see molality in Fig. 1 or Table 1) injected on the left side of the rock. As a result, the obtained chemical system can be characterized by elements, species, and phases presented in Table 2. Initially, all the calculations are performed using the HKF activity model for the aqueous species. Further on, we equally employ the Pitzer model to compare the acceleration obtained for the various modeling scenarios. The temperature of the resident and injected fluids (corresponding to the initial and boundary conditions, respectively) is 60 • C. For the inlet pressure, we consider P inlet = 100 bar. The heterogeneous permeability of the rock is presented in Fig. 2a next to the pore velocity in Fig. 2b reconstructed as the numerical solution of Eq. 7. The diffusion coefficient is fixed to be D = 10 −9 m 2 /s for all fluid species. We note that the setup of this numerical example is selected for illustration purposes only. The natural process (on the geological time scale) is more complex than the thermodynamic equilibrium approach described in the paper, due to kinetic/nucleation hurdles.
To illustrate the complexity of the reaction network modeled by the considered chemical system, we list the most important reactions below: As one can see, the log K constant for the system reaction ranges from -14 till 17 approximately causing the 21-order magnitude change in the equilibrium constant. Note, however, that the GEM algorithm for chemical equilibrium  computations used in this work does not need these reactions nor their equilibrium constants.
Accuracy of generated chemical fields. Figure 3 compares two-dimensional chemical fields generated by the reactive transport simulation employing two approaches for the chemical equilibrium calculation. On the left, we collect the results generated by the conventional approach, and the right column of plots corresponds to the ODML method. To be specific, it shows the time steps 500, 1500, and 2500 (corresponding to 0.48, 1.43, and 2.38 days of simulations) with the amounts of minerals measured in mol/m 3 . Dissolving CaCO 3 releases Ca 2+ , CO 2− 3 , and HCO − 3 ions. They, in turn, react with the ions of Mg 2+ (aq), injected by the brine, and precipitate CaMg(CO 3 ) 2 . After 500 time steps of the brine injection, we observe the dissolution of calcite (in blue) and simultaneous precipitation of dolomite (in orange). Here, some parts of the rock retain pure quartz (in gray), where dolomite is gradually dissolved away due to the continuous injection of the acidic brine. Figure 3c and d, corresponding to 1500 time steps of simulations, illustrate the fields with a preferential path forming in the parts of the rock with higher permeability and, as a result, larger amplitude velocities. Eventually, after 2.38 days, practically all calcite is replaced by the dolomite (see Fig. 3e and f). The fields generated by the ODML approach (on the right) are rather close to those on the left, demonstrating high accuracy of the method even when applied to heterogeneous rocks. Figure 4 illustrates the evolution of selected aqueous species after 27.42 minutes of simulations (or 20 time steps). We observe the local increase in all species   is able to maintain accuracy of the predicted chemical equilibrium states. These chemical fields correspond to the reactive transport simulations using the ODML acceleration strategy with tolerance ε = 0.001. The relative error obtained during the latter simulation is illustrated in Fig. 17 of Appendix. The confirmation on the elemental mass conservation constraint satisfaction for each element is presented in Fig. 18 of Appendix. We see that the mass balance relative error does not exceed the order of 10 −13 and lower depending on the element. Figure 5a illustrates the number of triggered on-demand learnings on each time step. We select different error control parameters ε to study how they affect the number of full GEM calculations (i.e., the on-demand learning operations) required for the ODML algorithm to satisfy each such tolerance. Most of the triggered learnings happen in the first 3,000 time steps (see Fig. 5b). After this, only 1-2 (out of 10,201) DOFs require occasional comprehensive and costly chemical equilibrium calculations to guarantee imposed accuracy levels on the chemical equilibrium states and permit subsequent states to remain accurately predicted. Even though these chemical states still contain precipitating dolomite and dissolving calcite, the ODML algorithm can successfully estimate them with minor errors. The legend of Fig. 5a includes the total number of training operations required during the reactive transport simulation with the ODML algorithm with different accuracy requirements imposed. Even though the number of trainings is ten times higher for the ε = 0.001 than for ε = 0.01, all the learnings' percentage remains below 0.02%. Figure 5 considers only the first 3,000 time steps to magnify the difference between the number of triggered conventional evaluations for each ε. We see that the total number of learnings triggered on these time steps for ε = 0.001, for instance, is 99.73% (13460 out of 13496) of all required full evaluations. We note that unlike homogeneous one-dimensional numerical test considered in [58], where on-demand learnings were the highest on the first few times step and then gradually decayed as reactive transport proceeds, in the heterogeneous case, we see several spikes in the number of learnings. For example, observe the increase in the number of learnings between steps 250 and 350, or between 850 and 1000. Such a sudden increase can be explained by the dissolution and precipitation front of the chemical system arriving to different parts of the rock with increasing/decreasing permeabilities and corresponding to more significant or lower amplitudes of the velocity. Having said that, the percentage of the total number of fast and accurate chemical equilibrium predictions enabled by the ODML algorithm remains higher than˜99.9% of all chemical equilibrium problems required in the simulations (i.e., less than˜0.1% of all such problems are solved using a full and expensive chemical equilibrium calculation provided by the conventional GEM or LMA algorithm). Figure 6a presents the computational cost of (i) conventional chemical equilibrium, (ii) smart chemical equilibrium, and (iii) transport calculations in relation to time steps. These simulation runs are performed for different ε assigned to the ODML algorithm. The transport costs are determined by the time required for solving the systems generated by the SUPG method. They remain 1-2 orders of magnitude lower than the conventional chemical equilibrium calculations. The figure also highlights that the ODML approach promotes considerable decrease of the CPU cost associated with chemical equilibrium calculations. More restricted ε is set, the higher the frequency of triggered on-demand learnings becomes. This   Figure 6b shows the ODML speedups corresponding to the CPU costs and tolerances considered in Fig. 6a. These speedups are presented step-wise as a ratio of the total time required for the conventional and smart chemical equilibrium calculations across all the DOFs. The red curve illustrates the ODML algorithm's speedup achieved for the strictest ε, and, as expected, it reaches the lowest speedup values until the time step 3000. The blue and purple curves depict the speedups of the ODML algorithm performed with tolerances ε = 0.01 and ε = 0.005, respectively, indicate a similar acceleration level. We highlight that all the lines converge to the same speedup 9×, approximately. This demonstrates that the ODML approach with on-demand clustering and priority-based search is able to attain similar speedup levels for different choices of accuracy tolerance. A stricter accuracy tolerance will cause this common speedup to be achieved later (after more time steps), but at the benefit of more accurate predictions for the rest of the simulation.

Computing cost reduction using the ODML (with the Pitzer activity model applied)
Besides the HKF activity model, we also evaluate the ODML performance with a Pitzer activity model for the aqueous phase. Figure 7 shows the computing cost (in CPU time) of the chemical equilibrium computations when using Pitzer activity model. Only 10% of 10,000 time steps are considered in this setting. We can note, compared to previous results using HKF activity model, that the speedups strongly depend on the choice of the activity model. The Pitzer model is considerably more computationally expensive to evaluate and require more Newton iterations Fig. 7 (a) Computing costs (measured as the CPU time in seconds) for each sub-procedure of the reactive transport simulation in relation to time steps (including the ODML algorithm run with different ε and using the Pitzer activity model). (b) The corresponding speedup factor in chemical equilibrium calculations achieved by the ODML application to minimize Gibbs energy than the HKF model. Because the ODML transforms a complex GEM calculation into a simpler matrix-vector multiplication (first-order Taylor prediction), the achieved speedups are higher the more expensive the GEM calculation is. Thus, when the Pitzer activity model is used, the obtained speedups are considerably higher than that when using HKF. Figure 7b confirms for two different options of accuracy tolerance, ε = 0.01 and ε = 0.001, that for simulations with the Pitzer model, the ODML algorithm achieves speedups 100-150x (10 times higher than for the HKF model). The list of the ODML method characteristics for various tolerances and activity models are summarized in Fig. 16a.
Clustering during the simulation process During the reactive transport simulation, we adopt the on-demand clustering strategy introduced in [58]. This strategy classifies chemical states on a subject of primary species associated with each of them. The set of clusters is updated every time the on-demand learning operation happens. During such learning/training, a new fully evaluated chemical equilibrium state is produced (with a zero-priority rank) and stored. Suppose the primary species of this chemical state coincide with the primary species in one of the existing clusters. In that case, this particular cluster will be enriched with a newly learned reference state. If we fully compute a new chemical equilibrium state and no clusters correspond to that state's primary species, a new cluster is created to store it. Whenever a reference chemical equilibrium state is successfully used to predict another, its priority rank (and the priority rank of the cluster) is incremented. Within each cluster, the records of learned chemical equilibrium states are sorted to use those with higher success rates/ranks first. Table 3 lists only those clusters that were successfully used as the reference states more than twice (with the ranks higher than one), keeping the numbering, in which they were added during the test. The first column reflects this order (in time), whereas the second one lists the primary Clusters # reflects the order they were generated. Frequency/Rank indicates the how often a cluster was used to retrieve suitable reference equilibrium state for new prediction. Column with # of Records corresponds to the number of fully evaluated and saved chemical states every cluster contains. Clusters #21 and #28 are responsible for 48,09% and 27.93%, respectively, of all 102,010,000 fast predictions, and together, these two clusters only have 4 learned equilibrium calculations species. Besides, the table presents the frequency each cluster was used in Taylor extrapolation as a reference equilibrium state as well as the number of equilibrium states each cluster stores. Clusters 1-19, 23, 27, 30-33 contain the stable calcite phase, but it is unstable in some of the clusters created on later times, reflecting the equilibrium states in which this mineral becomes completely dissolved. Clusters 20-22, 26 are responsible for chemical states where dolomite is stable or precipitates. Cluster 21 has the highest rank in providing reference equilibrium states for the accurate ODML approximations. This cluster's single reference equilibrium state was successfully used 49,064,564 times for predictions, which accounts for 48.06% of all chemical equilibrium evaluations.

Case II: Reactive transport of H 2 S scavenging
We continue with a reactive transport modeling of hydrogen sulfide scavenging, a widely adopted practice in the oil and gas industry production and processing operations. By sulfide scavenger, we understand a mineral that can react with sulfide species (H 2 S, HS − , and S 2− 2 , etc) and convert them to a more inert form. For that purpose, siderite (FeCO 3 ) is considered below. Generally, the increase of the hydrogen sulfide mass is related to activities of sulfate-reducing bacteria (SRB) as a side-effect happening in the waterflooded reservoir (also called the reservoir souring). The prediction of H 2 S generation and production in reservoirs remains a significant phenomenon for modeling and further understanding due to several reasons. Hydrogen sulfide is not only extremely toxic for humans and animals but is highly corrosive to most metals involved in the field operations. It may cause cracking of drill or transport pipes and tubular goods as well as destroy the testing tools and wire lines. Therefore, modeling of the mineral-H 2 S reactions is important for studying the field-specific hydrogen sulfide scavenging capacities.
Below, we provide the set of the most important reactions happening in this example together with corresponding log K constants ranging from -33.65 till 12.78: Numerical model setup Similar to the previous example, Fig. 8 presents a considered reactive transport numerical model. For the reservoir, the horizontal and vertical lengths are 100 and 30 meters. We consider 101 and 31 DOFs for its discretization, which results in a total of 3,131 DOFs that must be considered in each time step. We fix the temperature to 25 • C and inlet pressure to 1 atm (1.01325 bar), respectively. The heterogeneous permeability is illustrated in Fig. 9a, whereas the corresponding velocity v is shown in Fig. 9b. The diffusion of fluid species is neglected. The  Table 4. A detailed description of the chemical system can be found in Table 5. Note: In the ideal reservoir simulations, the rock matrix must contain a different proportion of minerals such as quartz, calcite, etc. However, such a simplification is assumed for the purpose of studying the scavenging process solely. As highlighted above, the considered numerical test conducted in this section studies heterogeneous sideritebearing reservoir continuously perturbed by the H 2 S-rich brine. Being highly soluble, FeCO 3 dissolves donating the iron ion Fe 2+ , i.e., FeCO 3 Fe 2+ + CO 2− 3 . Simultaneously, the donated ferrous ions react with the sulfides (delivered by the brine) such that iron-sulfide FeS (pyrrhotite/mackinawite) precipitates: Accuracy of generated chemical fields The dissolution of FeCO 3 (siderite) and precipitation of FeS (pyrrhotite) are  shown in Fig. 10. We arrange the chemical fields generated by the reactive transport simulations using the conventional chemical equilibrium solver on the left side. We also highlight the parts of the rock with a preferential path formed due to higher permeability. The two-dimensional chemical fields on the right side correspond to the same numerical test run using the ODML algorithm with ε = 0.01. The minerals' behavior is rather accurately approximated using ODML with even such a relaxed tolerance.
The dissolution and precipitation of minerals are accompanied by the increase and decrease in the aqueous species concentrations. Figure 11 shows the iron ions behavior at the same steps as the minerals' two-dimensional chemical fields discussed above. Throughout the plot, we observe an initial gradual increase of Fe 2+ as a result of FeCO 3 dissolution. It is followed by the considerable drop of the iron ion concentration at the point of the phase transformation from one mineral to another as it gets used by the FeS formation. The region's width, where Fe 2+ is increased, is also getting more significant as the reactive transport simulation proceeds. Figure 12 compares two-dimensional time snapshots of the sulfides S 2− 2 , HS − , and H 2 S(aq) at the time step 400 (˜141.04 days). The profiles with elevated species amounts coincide with parts of the rock penetrated by the injected brine. They also correspond to the sideritepyrrhotite transformation front. Figure 12 verifies that the smart chemical equilibrium algorithm use does not reduce accuracy during the simulation, as the left and right chemical fields are relatively similar. Another confirmation is provided in Fig. 19 of Appendix, presenting the relative error obtained during the simulation run with the ODML method. On top of this, the satisfaction of element mass balance conservation is automatically in-build into the reconstructed species abundances. Figure 19 of Appendix highlights that by depicting the relative error in the mass balance equation for selected elements.
Number of on-demand learning operations Next, we study the dependence of the number of on-demand learnings on the ODML's error control tolerance. In Fig. 13, the number of required comprehensive evaluations can reach up to 60, 22, or 20, depending on the selected tolerances ε = 0.001, ε = 0.005, or ε = 0.01, respectively. We see that triggered learnings are solely required up until 1,300 steps as the inserted brine penetrates the reservoir content. Again, we include the total number of on-demand trainings needed for each tolerance selected for the ODML in the legend of Fig. 13. This number is followed by the percentage it accounts from the total 6,262,000 chemical equilibrium problems that must be evaluated along the whole simulation process. As expected, the highest total number of learnings corresponds to the strictest tolerance ε = 0.001 (red marker). Due to heterogeneity of the medium, the velocity changes might cause more critical perturbations during the later transport step and, as a result, distinct initial chemical compositions for the ODML algorithm. It explains the isolated increase in learnings per time step (see, e.g., time steps 900-1,100 in Fig. 13). Nevertheless, independent of the tolerance, about 99.98% of all chemical states are approximated using smart predictions based on the priority-based clustering combined with the first-order Taylor extrapolation. For the Pitzer activity model, the total number of learnings is slightly smaller for more relaxed tolerance ε = 0.01 and higher for ε = 0.001, even though the profile of occurring training per time step looks relatively similar in Fig. 13a and b. Computing cost reduction using the ODML (with the Debye-Hückel activity model applied) Figure 14a shows the computational cost (measured in seconds) step-wise spent Fig. 10 The amount of siderite (blue) and pyrrhotite (orange) measured in mol/m 3 in the two-dimensional rock core at time steps 100, 200, 400, and 800, corresponding to 35.26, 70.52, 141.04, and 282.08 days of simulations, respectively. The plots on the left correspond to the reference/benchmark chemical fields obtained using full GEM for chemical equilibrium calculation. The plots on the right are the results generated during the same simulation but utilizing the ODML algorithm with ε = 0.01 on (i) conventional and (ii) smart chemical equilibrium calculations (run with ε = 0.001, ε = 0.005, or ε = 0.01) as well as (iii) transport-related computations. Even though we consider the heterogeneous two-dimensional problem, transport costs remain over 1.5 times lower than the CPU costs of conventional chemical equilibrium calculations. The ODML algorithm manages to decrease CPU costs of the chemical simulation by about one Fig. 11 The amount of iron cation Fe 2+ in the two-dimensional rock core at 100, 200, 400, and 800 time steps, corresponding to 35.26, 70.52, 141.04, and 282.08 days of simulations using the ODML algorithm with with ε = 0.01 order of magnitude. Reactive transport simulations using the smart approach with ε = 0.001 have the highest costs (red marker) with occasional spikes, e.g., between time steps 900 and 1100, corresponding to higher learning rates. Next to the computation costs, we present speedup that the ODML can achieve in chemical calculations compared to the conventional approach. We observe that the average speedup with the Debye-Hückel activity model is over 30x times, except for ε = 0.001 on the first 1,200-1,300 steps. Nevertheless, the speedup is stabilized eventually around the value 33x, independent of how much overall trainings were performed and stored on the first steps. It confirms the efficiency of the reference chemical states retrieval procedure when the priority-based queue and clustering are incorporated.
Computing cost reduction using the ODML (with the Pitzer activity model applied) For comparison, we perform a similar calculation using the Pitzer activity model (see Fig. 15). Figure 15a indicates that the step-wise CPU costs utilizing the classical method (gray curve) are approximately two orders of magnitude more computationally demanding than when using the simpler DH model. The gray curve is at least several thousand times more costly than the transportrelated green curve. The blue and red curves, corresponding to the reactive transport simulation with the ODML method, indicate that the smart approach drastically improves the costs of the conventional calculations, yielding them comparable to transport operations. The corresponding speedup presented in Fig. 15b has initially relatively low values due to the excessive number of training operations. However, it reaches 3500x (blue line) and 2500x (red line) and finally stabilizes around 2000x value at later steps. The ODML performance summary for various tolerances and activity models is presented in Fig. 16b.
Clustering during the simulation process Similar to the previous section, we present the clusters generated during the reactive transport simulation with the ODML algorithm with ε = 0.01. Table 6 lists only those that were successfully used as the reference states more than twice, using the order they were created.    Even with the presence of the non-unit stoichiometry coefficients and a rather wide range of the equilibrium Fig. 15 (a) Computing costs (measured as the CPU time in seconds) for each sub-procedure of the reactive transport simulation in relation to time steps (run with different ε and using the Pitzer activity model).
(b) The speedup factor in chemical equilibrium calculations achieved by the ODML application(run with different ε) Fig. 16 Recapitulation of the total number of learning operations, the percentage of the smart predictions in relation to the total number of the chemical equilibrium calculations, the range of speedups in chemical equilibrium calculations step-wise, and the overall reactive transport simulation speedups for different tolerances applying (a) the HKF and the Pitzer activity models in the dolomitization example and (b) the DH and the Pitzer activity models in the scavenging example constant, the efficiency of the ODML approach remains similar to the above-reported test cases.

Discussion and conclusions
Reaktoro and Firedrake's coupling enabled investigating the performance of the new ODML algorithm on more challenging reactive transport problems. The obtained numerical results confirm that the resulting acceleration of the chemical equilibrium calculations provided by the new strategy depends on several factors. The first and most important one remains the activity models used for the aqueous species in the numerical experiment. The second is the chosen error control/accuracy tolerance (parameter ε) which determines how strict the acceptance criterion is. Ultimately, the obtained acceleration depends on the complexity of the numerical reactive transport experiment, e.g., chemical system size, heterogeneity, mesh dimension, etc. Having said that, the ODML algorithm enables speedups of one to three orders of magnitude in chemical equilibrium calculation and at least one order of magnitude of overall reactive transport simulations. A significant property of ODML strategy is its ability to converse mass of chemical elements (including the electric charge) up to machine precision levels, a capability not naturally found in prevailing neural network algorithms (e.g., neural network and most classes of surrogate models). This inherent feature of the algorithm is explained and demonstrated mathematically in [58].
To highlight the most critical performance characteristics of the ODML algorithm, we use Fig. 16a and b. To be specific, we recap the overall number of learning operations each simulation run required for various tolerances and activity models. Furthermore, we emphasize that the percentage of smart predictions remains greater than 99.8% in relation to the total number of chemical equilibrium evaluations throughout complete simulation run. We additionally include the lowest and highest speedups in obtained in chemical equilibrium calculations step-wise as well as the overall speedups in the reactive transport simulations enhanced by the ODML method.
In future works, we aim to apply the ODML strategy to accelerate modeling of chemical kinetics with partial chemical equilibrium assumptions, typical in geochemical systems. We equally consider further investigations with more challenging geochemical and geological models. We plan to extend Reaktoro's functionality to model reservoirs souring as a result of the activities of sulfide-reducing bacteria, mixing of the groundwater and seawater in the oil reservoir as well as scaling effects this process results to, modeling the effects that seawater or sodium chloride cause during the cement rock attack, among many more. An extension to the three-dimensional problem with heterogeneity will require a stable numerical scheme to solve the Darcy problem. To enable full coupling of the transport and flow problems, we plan to continue developing the reactive transport simulator and present results in future articles.
As this work presents a new coupling of the software packages to attempt reactive transport modeling, we acknowledge an important and well-known MoMaS benchmark [1,3,15,23,30,39] used to compare numerical solvers developed for reactive transport simulations in porous media. At this moment, we are limited to chemical equilibrium formulations in terms of the GEM, but are currently working on the new version of the ODML algorithm. Therefore, assessing its performance for the chemical equilibrium problems considered in the MoMaS benchmark is planned to be included in future work. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.