TOUGH3-FLAC3D: a modeling approach for parallel computing of fluid flow and geomechanics

The recent development of the TOUGH3 code allows for a faster and more reliable fluid flow simulator. At the same time, new versions of FLAC3D are released periodically, allowing for new features and faster execution. In this paper, we present the first implementation of the coupling between TOUGH3 and FLAC3Dv6/7, maintaining parallel computing capabilities for the coupled fluid flow and geomechanical codes. We compare the newly developed version with analytical solutions and with the previous approach, and provide some performance analysis on different meshes and varying the number of running processors. Finally, we present two case studies related to fault reactivation during CO2 sequestration and nuclear waste disposal. The use of parallel computing allows for meshes with a larger number of elements, and hence more detailed understanding of thermo-hydro-mechanical processes occurring at depth.


Introduction
The current development of georesources exploitation strongly relies on numerical simulation of the processes occurring at depth. Understanding of the coupled thermo-hydromechanical processes is essential to assess properly the changes in system conditions as well as to study the risks associated with the underground exploitation (e.g., loss of circulation; caprock failure; induced seismicity). Model developments and their applications constitute a huge step toward understanding coupled processes. Several numerical simulators are already available in the literature for the study of coupled processes at various levels of complexity. Some models allow for all Thermo-Hydro-Mechanical-Chemical (THMC) couplings. E x a m p l e s a r e : T O U G H R E A C T-F LA C 3 D [1 -3 ], OpenGeoSys [4], Dumux [5], COMSOL Multiphysics [6]. In addition, other simulators have been applied to study partial processes coupling, THM or THC: e.g., Sierra Mechanics [7], 3DEC [8], CODE-BRIGHT [9], CSMP++ [10,11], PFLOTRAN [12].
In numerical modeling, the governing equations (conservation laws of mass, momentum and energy) are solved considering the relationship among processes (e.g., coupling of two or more processes), and completed with constitutive laws, initial conditions and boundary conditions. One factor determining the computational effort is the number of simulated THMC coupled processes. Another factor is the numerical scheme. In the literature, the term monolithic refers to a scheme in which the physical equations for multiple processes are solved simultaneously, which may be computationally expensive. More loose couplings exist, such as one-way (i.e., a given process influences another, but not vice versa) or two-way sequential (i.e., the different processes are considered in sequence). Such schemes are less computationally intensive and may refer to the same simulator, but often the integration of different codes is used to take advantage of specialized codes and to increase the types of simulated processes [1,[13][14][15][16].
Given the complexity of the coupled processes, verification of the numerical approach is often an issue. Analytical solutions are only available for very simplified processes (e.g., only for fully saturated medium), and observations from lab and in situ experiments involve significant uncertainties. Benchmarking activities involving code-to-code comparison and validation against analytical solutions and experimental data are often in play for developing numerical models [17][18][19][20]. The TOUGH family of codes are commonly applied to model the coupling of fluid flow and heat transport in geological media [21,22], and have been extended to consider coupling to geomechanical processes. In particular, several TOUGH-based geomechanical codes have been developed to solve THM problems [1], among which TOUGH-FLAC is the most widely used, with recent applications featuring inverse modeling [23,24] and finite strain deformation [25]. Since its initial development in the late 1990s [26], TOUGH-FLAC has been applied to study geomechanical aspects of CO 2 sequestration, nuclear waste disposal, enhanced geothermal systems, underground gas storage and compressed air energy storage, gas production from hydrate-bearing formations, induced seismicity, as well as for the implementation and the study of constitutive equations ( [1] and references therein).
The most common version of TOUGH-FLAC accounts for the two-way sequential coupling of TOUGH2 [22] for the simulation of non-isothermal, multi-phase and multicomponent fluid flow with upgraded versions of FLAC3D (e.g., ver.4/5, [27]) for solving the mechanical equilibrium. The equations for fluid flow and geomechanics are solved sequentially, and the approach is unconditionally stable, using the fixed-stress split sequential scheme: the flow sub-problem is solved first with a fixed total stress field, which is then modified in the subsequent geomechanics sub-problem by using modified variables from the flow step [28].
Despite the wide use of TOUGH-FLAC coupled simulations in the literature, applications are often limited to a relatively small computational domain, with a number of elements usually smaller than 50,000. In this work, we moved one step forward by coupling for the first time the newly developed TOUGH3 [21] with versions 6.0 and 7.0 of FLAC3D [29], hence implementing a parallelized version of the well-known coupled simulator. The coupled simulator integrates all the new functionalities of TOUGH3, including the use of PETSc solvers, together with the improved solver performance in FLAC3Dv6/7 as well as the possibility of using Python scripting compared to the FISH programming embedded in previous versions of FLAC3D. After verifying the correctness of the approach comparing the simulation results with an analytical solution and with the previous version, we evaluate the performance of the newly developed approach. Finally, we present results of two case studies, aimed at understanding the potential for fault reactivation during CO 2 sequestration and the evolution of stress and strain during nuclear waste disposal in a deep geological repository.
Although specific to the well-known codes TOUGH3 and FLAC3D, the approach presented in the current manuscript can be generalized and be used by the entire geosciences' community working on sequential coupling for studying coupled processes. The sequential coupling of two codes both running in parallel is not trivial, and it represents a novel computational approach. We demonstrate that the use of a fast wrapper (i.e., written in Python) can help to strongly reduce the computation time.
2 Mathematical formulation and single code performance 2.1 Mass and energy balance equations of the fluid sub-problem The fluid flow formulation described in here, closely follows the description in TOUGH2/3 User's guide (e.g., [22]). For non-isothermal, multi-phase, multi-component flow, the mass of each component k can be generally written as summing over the fluid phases: where ϕ is porosity, S β is the saturation of phase β, ρ β is the density of phase β, and χ k β represents the mass fraction of component k in phase β.
The advective flow for the k-component is given by the sum over the phases: where u β is the volumetric flux derived from the Darcy's law: where κ is the absolute permeability. κ rβ , η β , and p β are the relative permeability, the viscosity and the pressure of the phase β, respectively. P β = P + P cβ , where P is the pressure of a reference phase (here, gas) and P cβ is the capillary pressure.
Mass transport can also occur via diffusion and hydrodynamic dispersion, expressed through Fick's law in the general form: where D k β is the hydrodynamic dispersion tensor depending on the porous medium dispersivity, the Darcy velocity, the coefficient of molecular diffusion, and the phase-dependent tortuosity.
The mass balance equation can be written in the general form for each component k: with r k being the change term denoting sinks and sources. Similarly, the energy conservation equation, accounting for heat propagation and fluid flow, can be written as: where Q represents the energy gain/loss from sink and sources, q h is the heat flow, and M h is the accumulation term. This latter takes into account the internal energy per unit volume as: where T is the temperature, ρ R and C R are the rock density and specific heat, and u β is the specific internal energy of the phase β. The heat flow due to conduction and fluid advection is given by: where λ is the thermal conductivity, F β = ρ β u β is the mass flow rate for the phase β, and h β the corresponding specific enthalpy. TOUGH3 solves the mass and energy balance equations by means of an integral finite difference method for space discretization with a first-order fully implicit time formulation. Each time step involves the calculation of the Jacobian matrix as well as the solution of the equations using Newton-Raphson iterations. The time steps are automatically adjusted given the convergence rate. A comprehensive description can be found in the TOUGH2/3 User's guide ( [21,22]).

Geomechanics sub-problem and coupling approach
Mechanical equilibrium is calculated by solving the momentum equation, that can be expressed as where σ is the Cauchy stress tensor, b is the vector of body forces per unit mass, and v is the velocity. By neglecting the inertial terms, and using the indexing notation, the equation above reduces to σ ij, j + ρb i = 0. The relation between stress and strain is then provided by a constitutive equation, linked to the nature of the medium being deformed. The coupling between fluid flow and deformation is described by the Biot's theory of poroelasticity. The total stress (negative for compression) is affected by the equivalent fluid pore pressure p eq ¼ ∑ β S β p β : where α is the Biot's coefficient and δ ij is the Kronecker's delta. The constitutive equation for a given β phase relates the variation in fluid content ζ V/V; β to the pore pressure p β , saturation S β , mechanical volumetric strain ε vol , and temperature: with α T representing the undrained thermal coefficient and M β being the Biot modulus, expressed as: where K β is the bulk modulus of the phase β and K is the drained bulk modulus.
The changes in fluid content are related to changes in porosity as: with Δϕ being a porosity correction term due to mechanical deformation [30]. This provides an approach for sequential coupling of fluid flow and mechanical calculation, such as the fixed-stress split sequential method [28,31]. In addition to porosity, other flow variables such as permeability k and capillary pressure P cβ may be affected by mechanical changes [1,26]. Plastic deformation can be accounted for by using a failure criterion f(σ n ) and decomposing the strain increment into the sum of elastic and plastic parts. For the latter its direction is specified as being normal to a potential surface g(σ n ) and following a flow rule: where the subscript i refers to the increment in strain/stress and λ is a plastic multiplier. FLAC3D solves the mechanical problem by means of an explicit finite different approach, in which the laws of motion are discretized at the nodes and the resulting system of equations is solved by explicit finite difference in time [29]. In the coupling with TOUGH, the mechanical formulation is always run to equilibrium every time step. Mechanical equilibrium is reached when the ratio between the maximum unbalanced force magnitude and the average applied force magnitude falls below a threshold limit. For time-dependent deformation the coupling needs to be adapted accordingly [18].

Single codes parallel computing performance
TOUGH3 is parallelized by means of MPI, while FLAC3D uses threaded processes. Up to very recently, FLAC3D only ran on Windows operating system, but can now be run on Linux -based systems. For a Windows-based system, to make use of the MPI parallelization provided in TOUGH3, the code can be compiled in Cygwin or Windows Subsystem for Linux (Windows 10 and 11). For the current manuscript, we used WSL1.
Each of the two codes was evaluated individually. The speed increase was calculated with respect to the single MPI process/thread calculation. Results are shown in Fig. 1. We evaluated TOUGH3 on two different operating systems (OS) but on an identical hardware configuration: a 32-core virtual workstation with AMD EPYC 7742 at 2.25 GHz and equipped with 64 GB RAM. For a simulation with 50,000 elements, we noticed for TOUGH3 a plateau in performance when reaching roughly half of the available cores on the Windows workstation, with a speed increase of about 4 (Fig. 1a). On a Linux workstation, the increase is observed up to the limit number of cores (increase up to 10-fold). Figure 1b shows the performance of TOUGH3 for a simulation with 800,000 elements: the speed increase is similar to the smaller mesh.
All our tests were done on a workstation (i.e., a shared memory system), as FLAC3D, and hence the final coupled code, is not yet supported on high-performance computing (HPC) clusters (i.e., a distributed memory system). The test simulation has also been designed to be a typical case for running a coupled problem (i.e., with relatively small number), and we decided to test in the most common conditions for a user. This implies creating output files, which are dealt with in serial, thereby compromising the speed increase. Note that TOUGH3 has demonstrated a strong scaling on distributed memory systems ( [21], Fig. 1), in a case with 2.88 million elements. Hence, the limited performance shown in Fig. 1 is related to the operating system (OS). For the given simulation, a common simulation case as highlighted above, the performance improves on a Linux OS (with the exact same hardware as for the Windows case), but does not reach the potential scaling we could have on HPC clusters [21].
The performance of FLAC3D was evaluated for three versions (v5, v6, and v7) only using the Windows workstation with up to 32 threads (i.e., same as the number of cores with the given hardware configuration). Figure 1c and d show the speed increase, with v7 being sensibly faster than v5 and v6 for the 50,000 elements simulation, while v6 and v7 perform similarly for a simulation with 800,000 elements. We observe a 10-fold and 12-fold increase for a simulation with 50,000 and 800,000 elements, respectively. While FLAC3Dv7.0 can also be run on a Linux OS, we prefer to show its performance under Windows for comparison with the previous versions of the simulator.

Coupling strategy
The main modifications of the coupling strategy compared to the previous versions are: & Use of TOUGH3 with parallel computing and use of FLAC3Dv6/7, both allowing for faster calculation; & Use of binary instead of ASCII exchange files. This is particularly important when dealing with computational domains containing a large number of elements; & Use of a flag system to keep FLAC3D mechanical state into memory. In the previous formulation, a new instance of FLAC3D was called at each time step. At a given flow time step "i", the full mechanical state was saved to a binary file and loaded back at the time step "i + 1" and solved for mechanical equilibrium with the new hydraulic solution from TOUGH3. In the new approach, the FLAC3D model state is only loaded at the beginning of the simulation (i.e., for the initial conditions), then FLAC3D is paused during TOUGH3 execution and the model state kept in the memory, thereby avoiding overhead caused by restoring/saving the mechanical state at each TOUGH3 time step; & Use of Python to read/write coupling files in FLAC3D and calculate internal variables. By using NumPy [32], this can produce up to 34× faster execution (10× in average) in variable allocation compared to the previously used FISH scripting [29].
The strategy proposed here is general and can be applied in both Windows and Linux environments. All the coupled simulations presented here were carried out on the Windows virtual workstation that allows using up to 32 MPI processors/threads.
The general coupling strategy between any TOUGH-code and FLAC3D is based on file exchange to share variables and/ or properties. In MPI codes, and in particular for TOUGH3, a processor is assigned as "IOProcessor" and takes care of all the input/output functionalities of the code. Figure 2 describes the coupling strategy for TOUGH3 and FLAC3D. Figure 2a shows the numerical scheme for the coupling in time between the two codes, and Fig. 2b provides details of the coupling for each time step.
As in previous versions of TOUGH-FLAC, and in agreement with the fixed-stress split sequential method, TOUGH3 is the main code, which runs the simulation through time and advances the time steps according to the stability of the fluid flow sub-problem. At each time step and during the first Newton-Raphson iteration of TOUGH3, FLAC3D computes the mechanical equilibrium to a predefined convergence threshold.
For each time step, before starting the iterations for solving the fluid flow sub-problem, TOUGH3 invokes a subroutine to gather the arrays from all the MPI processes. Such arrays (pressure, temperature, saturation and capillary pressure) are written to file TOU_FLA by the IOProcessor, together with a flag (1). The IOProcessor performs this entire stage in serial, while all the other n MPI processes are idle. During this process, FLAC3D is idly waiting for the flag to change. Then, FLAC3D (i) reads the TOU_FLA file (in serial), (ii) solves for mechanical equilibrium (in parallel), and (iii) writes the FLA_TOU file to transfer data to TOUGH3 (in serial), and modifies the flag (2). At this stage, the subroutine invoked previously by TOUGH3 is waiting for FLAC3D to finish execution, then the IOProcessor serially reads the flag and the FLA_TOU file and distributes the variables/ properties (bulk modulus, Biot's coefficient, strain, and stress) to all n MPI processes.
Finally, the parallel computing can restart with the calculation of mechanically-induced changes of flow properties and with continuation of the flow iterations to finish the current time step. When TOUGH3 is at the last time step, it will issue a flag (3) that FLAC3D will interpret and save the final state in a binary file. The mechanical state as well as the flow variables can also be saved at predefined times during execution. The use of Python within FLAC3D allows for easier handling of arrays (e.g., mapping of a given variable or extra postprocessing computation), and also for handling and personalizing the entire output functions, making the use of standard TOUGH3 output functions redundant. performance on two workstations operating on the same hardware configuration but running on different OS, for a computational mesh with 50,000 and 800,000 elements, respectively. (c,d) FLAC3D version 5, version 6, and version 7 performance on the Windows workstation, for a computational mesh with 50,000 and 800,000 elements, respectively In general, the fluid flow sub-problem and the mechanical sub-problem are treated independently regarding the convergence. This approach is numerically quite stable for a wide range of mechanical problems (e.g., thermo-poroelasticity or quasi-static fault reactivation), but it always requires an appropriate choice of time stepping. The accuracy of the solution is linked to the choice of the time step, which needs to be carefully chosen in relation to the physical processes under investigation. As described above, in TOUGH-FLAC, the time stepping is dictated by TOUGH3, whose automatic time-stepping is linked to the flow problem but that may result in inaccuracy of the mechanical solution if not carefully set (e.g., undrained response). The choice of an appropriate time stepping is even more relevant when including timedependent deformation (e.g., viscoelastic or creep). For this latter case, the TOUGH3 time stepping needs to be Fig. 2 a TOUGH-FLAC numerical scheme in time. The shaded area is the time step between t n and t n + 1 . The figure was modified after Blanco-Martín et al. [25]. b Coupling strategy between TOUGH3 and FLAC3D for each time step. Green parts are executed in parallel, while red parts are executed in serial. P, T, S, P cap are pore pressure, temperature, saturation, and capillary pressure, respectively. K, α, ε, σ, refer to bulk modulus, Biot's coefficient, strain, and stress, while Δϕ and Δκ stand for porosity and permeability changes harmonized with the time stepping of the deformation in FLAC3D [18]. In the case of frictional instability (e.g., fault reactivation), the choice of the time stepping could also be relevant. The case of quasi-static approach is essentially simulating the whole time-dependent frictional instability (e.g., an earthquake) in a single time step and relies only on the final stress drop to be redistributed in the nearby medium. This approach has been used extensively with TOUGH-FLAC to simulate fault reactivation [33]. In the quasi-static approach, the case of slow deformation on a fault (usually referred to as slow-slip) is treated as linear-elasto-plastic deformation. However, this may lead to inaccuracy, and in order to correctly simulate the case of fault slow-slip (or creeping) or even the full generation of waveforms, one would need to implement a quasi-dynamic or dynamic approach, in which the fluid time step would need to be strongly harmonized with the deformation time step. Furthermore, accounting for permeability changes linked to stress/strain introduces an extra non-linear term in the solution of the flow sub-problem.

Analytical solution
An analytical solution can be derived for a Terzaghi-like problem [34]. The problem here is part of the BenVaSim initiative to verify and benchmark several numerical codes [20]. While being a simplified numerical exercise, the model setup physically resembles a dam construction in a flooded drift with a pore pressure gradient allowing water flowing through the host rock. As the current approach is based on previous versions, more verifications can be found elsewhere [25].
The model is one dimensional, fully saturated 10 m-long domain with displacement completely fixed in y-and z-directions and at x = 10 m (Fig. 3a). The initial pressure and total stress in the model are set to 0.1013 MPa. Pore pressure is 1 MPa at the left boundary (x = 0 m), and 0.1013 MPa at the right boundary (x = 10 m). A total stress of 1 MPa is applied at time t = 0 + at x = 0 m. The base case scenario accounts for a porous and low-permeability material (porosity ϕ = 0.15; permeability κ = 10 -20 m 2 ) with stiff and deformable matrix (Young's modulus E = 8 GPa; Biot's coefficient α = 1, and Poisson's ratio ν = 0 to allow for 1D problem). Variations to the base case account for compressible grains (Biot's coefficient α = 0.75) and for very soft material (Young's modulus E = 150 MPa). Both TOUGH3 and FLAC3D convergence thresholds are set to 10 -7 . Figure 3b shows the profiles of the pore pressure at different times comparing the numerical (solid line) and the analytical (shaded line) solutions. Results show that for different times, the developed approach is able to match the analytical solution, with only minor differences mostly related to space discretization (50 elements for the 10 m long domain). The initial pressurization of the system (undrained response at 0.003 y) is somewhat larger than the analytical solution due to coarse time discretization, while the steady state is well matched (at 30 y for the base case scenario). A more compressible grain will result in less undrained pressurization (Fig. 3c), while the soft matrix will result in much larger deformation, allowing for larger porosity decrease in the matrix, and larger pressure increase during the undrained response (Fig. 3d). As it can be seen, the numerical approach is in good agreement with the analytical solution also for more critical scenarios.

Comparison with previous simulator
In order to verify the validity of TOUGH3-FLAC3Dv6/7, we compared results with the previous TOUGH2-FLAC3Dv5 [25]. We checked the results for variables such as injection pressure and temperature as well as the uplift of the top boundary.
We account for a 3D computational domain (10 km × 10 km × 4 km) fully saturated and with homogeneous hydraulic properties (constant permeability κ = 5·10 -15 m 2 , initial porosity ϕ = 0.1, rock grain density ρ R = 2550 kg/m 3 , fluid density dependent on pressure and temperature as default for TOUGH3, rock grain specific heat C R = 800 J/kg°C, heat conductivity λ = 2.0 W/ m°C). For simplicity, we assume a constant permeability, while the porosity changes as described in Eq. 13. The model ranges from a depth of −2 km to −6 km, and the top and bottom boundaries, as well as the boundaries at x = y = 10 km, are open to fluid flow. The boundaries at x = y = 0 km are closed and allow for symmetry. Mechanically, we assume a poroelastic material (Young's modulus E = 10 GPa, Poisson's ratio ν = 0.25, Biot's coefficient α = 1), with the top and side boundaries (x = y = 10 km) at fixed stress conditions, with rollers for all the other boundaries. We assume initial hydrostatic gradient for pore pressure, geothermal gradient for temperature (30°C/km), and lithostatic gradient for stresses. We simulate 60 days of cold-water injection (T = 10°C) in a saturated medium with variable rate (30 days at 30 kg/s, 20 days at 60 kg/s, and 10 days at 90 kg/s), followed by 40 days of shut-in period for a total simulation time of 100 days. The injection region is at a depth of 4 km and extends over a region 50 m × 50 m × 50 m. For both simulations, we use a mesh with~46,500 elements. Porosity changes depend on the bulk modulus and on the total volumetric strain, when larger than 10 -4 . The FLAC3D mechanical ratio between the maximum unbalanced force magnitude and the average applied force magnitude is set to 10 -7 . The TOUGH3 convergence criterion is set to 10 -5 .
As shown in Fig. 4, the two approaches are in extremely good agreement, with differences in pressure in the order of some Pa, minor differences in temperature, and differences in uplift in the order of some microns. For each computational domain, we have created similar initial conditions (via steady state simulation), to avoid biases on the final simulation, which is the same as what was described in Section 4.2. Some differences may arise for small meshes in  Fig. S1). For a single MPI process/thread, the code is 1.5x faster compared to the previous version only owing to better I/O handling and use of Python in FLAC3D. Due to parallelization overhead (communication and domain decomposition), the speed up clearly depends on the size of the mesh, with an increase of up to 2× faster for a coarse mesh with 15,000 elements and up to 5× faster for a relatively fine mesh with 800,000 elements. Interestingly, for all the cases the performance does not improve after reaching 16 cores/threads. This is consistent, however, with the performance observed for TOUGH3 on the same hardware configuration ( Fig. 1a  and b), which is attributed to conflict with the operating system processes, as discussed above. FLAC3D is not affect by this behavior, so in general we obtain a better performance in terms of speed-up for TOUGH3-FLAC3D compared to TOUGH3-only, although the absolute running time is much smaller for the latter. For a fairer comparison with the previous  versions of the TOUGH-FLAC coupling, we present the testing only on a Windows-based system, although the overall performance should be better on a Linux-based system (as highlighted by Fig. 1a and b).

Fault reactivation during CO 2 sequestration
The model presented here closely follows previous works addressing the same topic [33]. A three-dimensional model was already proposed by Rinaldi et al. [35], who addressed the effect of the well orientation on induced seismicity and CO 2 leakage through the fault. Here, thanks to the faster solver, we introduce a further complication in the model, which is the presence of a multiple fault system (Fig. 6). A similar model was also employed for studying the natural seismicity occurring at Matsushiro, Japan [36], but here the much larger number of elements allows for better details.
We simulate a 3D computational domain 10 km × 10 km × 3 km with 91 × 165 × 74 elements in the three directions (total of about 1 million elements). The two fault zones strike N90°and N180°while dipping 90°and 80°, respectively, and intersect at the center of the computational domain, assuming the north is oriented in the y-direction. (Fig. 6). Injection occurs in a 100 m thick reservoir, bounded by two 150 m thick caprock formations, at a distance of 500 m from each fault (at a single point x = 4500 m, y = 4500 m, z = −1500 m), and with a constant rate of 50 kg/s (1.6 Mt/y) for a total of 3 years. Initial conditions account for hydrostatic pressure and geothermal gradient, and the simulation is considered isothermal (i.e., temperature is only needed to calculate the fluid viscosity and density). Initial stress follows a strike-slip regime, with both maximum and minimum principal stresses horizontal. The maximum horizontal stress (σ H ) is oriented N45°with a stress ratio σ H /σ V = 1.12, while the minimum horizontal stress (σ h ) has a ratio of σ h /σ V = 0.62, with the vertical one (σ V ) being the lithostatic stress. Boundaries are all open to fluid flow with constant lithostatic stress and hydrostatic pore pressure, except for the bottom where the displacement normal to the boundary is null. The system is initially fully saturated with brine, with retention curves for capillary pressure and relative permeability following van Genuchten [37].
For the sake of simplicity, we do not include permeability changes at this stage, as we focus on the reactivation of the faults. Permeability changes are instead more relevant when studying CO 2 leakage [38]. We account for the full hydromechanical coupling by modeling porosity changes as function of the volumetric strain and pore pressure.
We assume for all rock formations elastic rheology, except the core of both faults, which follows a strain-softening ubiquitous-joint model with frictional law depending on the accumulated plastic strain [39]. Both elastic and hydraulic properties for the different domains are listed in Table 1. The simulation execution time with the given setup is comparable to a similar case in 2D and single fault for the previous version of the simulator (some hours).
Pressure evolution and CO 2 plume are shown in Fig. 7. Results show that the pressurization of the reservoir is quite fast with changes up to 4 MPa near the injection point. Both Fig. 6 Computational domain for the study of fault reactivation during CO 2 injection faults start pressurizing right after injection starts, and less than 5 MPa are needed to reactivate both faults, with reactivation time depending on the fault strength (or actually on the fault orientation with respect to the state of stress). The CO 2 plume is still confined close to the injection, extending up to 200 m when both faults are reactivated. Figure 8 shows how the rupture starts occurring on Fault 1 (the vertical blue fault, Fig. 6), and it is followed several days after by reactivation on Fault 2 (the dipping yellow fault, Fig. 6). This is consistent with the state of stress, according to which Fault 1 is favorably oriented for shear activation. Given the frictional law in the ubiquitous joint model, the friction angle drops in the ruptured area (i.e., the one where plastic strain accumulates) from the peak value (31°) to the residual (29°). Reactivation on Fault 1 occurs after only 30 days of injection with maximum slip of 0.6 cm ( Fig. 8a-b), and it is followed by the reactivation on Fault 2 after 70 days of injection with a maximum slip of 0.2 cm, which involves only a small minor patch on the fault plane ( Fig. 8c-d). Worth to note that while the injection continues, the rupture continues extending on the two faults, reaching a maximum extent after 180 days of injection, which is much shorter than the total injection time of 3 years. The rupture area is larger on Fault 1 given the more optimal orientation for shear rupture with respect to the stress field.

Potential for fault reactivation during geological nuclear waste disposal
The model presented in this section aims at understanding the stress and strain changes occurring at depth in a nuclear waste geological repository. We simulate the heat generated by several, parallel nuclear waste emplacement tunnels located in argillaceous clay host rock, i.e., following the Swiss concept for geological nuclear waste disposal [40]. The use of a refined mesh allows for more details. Figure 9a shows the 10 km × 10 km × 5.5 km three-dimensional computational domain, with 113 × 102 × 86 elements in the three directions and starting at the ground surface. We simulate conditions during 2000 years after nuclear waste disposal in a repository located in a clay layer with anisotropic permeability at a depth of 700 m, and embedded within two seal formations as well as under-and overburden. Thanks to symmetry, we simulate only a quarter of the domain, and simulate 13 half-length tunnels. Each tunnel has a length of 450 m and the tunnels are 50 m apart. Each element of the tunnel has a finite volume with a heat source variable in time (Fig. 9b), and is connected to two elements of the main computational domain (Fig. 9a). The boundaries at x = 0 m and y = 0 m are closed to fluid flow and have null displacement normal to the boundary to simulate symmetry. The other boundaries are open to fluid flow and have constant lithostatic stress and hydrostatic pore pressure, with the exception of the bottom boundary where the normal displacement is blocked, and the pore pressure is set to~54 MPa. Initial conditions follow hydrostatic and geothermal (30 C°/km) gradients, while we impose normal stress conditions with the lithostatic vertical stress (σ V ) being the maximum principal stress and with σ xx = σ h as minimum principal stress. We impose stress ratios σ h /σ V = 0.58 and σ H /σ V = 0.8. We simulate full thermo-hydro-mechanical coupling by assuming that the porosity depends on strain, pressure, and temperature. We neglect at the current stage any permeability variation as a function of the stress field. For simplicity, all the layers follow an elastic rheology, and we varied only the hydraulic properties (Table 2). We use here single-phase conditions, assuming that the tunnels are already fully saturated at hydrostatic pressure conditions after emplacement, i.e., ignoring some of the short-term re-saturation processes that may take tens of years [2] as well as potential gas generation [41,42]. The full 2000 years are simulated in a bit more than 2 hours. Figure 10 shows the temperature and pore pressure distribution at different times of evolution. Thermal effects are slower and only few°C changes are observed 10 years after emplacement in the near repository region (Fig. 10a), but the temperature changes are large enough to enable so-called thermal pressurization, a phenomenon known to occur when heating pore fluids in low permeability rocks, such as shale [43][44][45]. In this case, the thermal pressurization causes a relatively small pore pressure  (Fig. 10d). The domain remains fully saturated.
At later stages, temperature changes start distributing in the domain up to several hundreds of meters outside the repository after 1000 years (Fig. 10c). At the same time, the pressure changes are diffused in the low permeable clay formation, and only minor changes are observed after 1000 years (Fig. 10f).
The temperature changes due to the heat generated by the nuclear waste can be responsible of quite large deformation at the ground surface, up to several cm uplift after 1000 years (Fig. 11a, b, and c). We also evaluated the potential for fault reactivation. Starting from the changes in the full stress tensor, we evaluate the Coulomb Stress change as ΔCFS = Δτ + μ(Δσ n + Δp), where Δτ is the change in shear stress, Δσ n is the change in normal stress, Δp is the change in pore pressure and μ is the friction coefficient, with the convention of stresses negative for compression. Shear and normal stresses are calculated for faults striking parallel to the tunnels and with 80°d ip angle toward the repository (i.e., strike N180°). Figures 11d-f show how the repository itself is undergoing negative Coulomb stress changes, meaning that failure is hindered. At early times, the thermal pressurization is causing only more compression and stabilizing faults in the near repository region (Fig. 10d). Failure of the considered fault orientation is, however, favored at greater depth, where more seismogenic faults could be present. This is linked to the shear transfer caused by temperature changes, and it is then particularly relevant when the thermal effect starts distributing outside the clay formation (e.g., at 100 or 1000 years - Fig. 11ef). These results are valid for steeply dipping faults and are well in agreement with a recent 2D study on fault reactivation during disposal of nuclear waste at depth [46]. The removal of some bottlenecks, thanks to the use of binary files rather than ASCII, and the removal of save/restore operations for FLAC3D, largely helped in improving the performances of the coupled approach. The current coupled code allows up to 5-fold increase in execution speed for 32-core workstation compared to the previous version for a mesh with about 800,000 elements. The main goal of the current work is to demonstrate how the code has improved compared to previous implementations, rather than to show the best conditions (i.e., OS choice) to simulate coupled processes with TOUGH-FLAC. For a fair comparison with the previous versions, it was necessary to test the current coupling approach on a Windows-based system. We expect the overall performance to significantly increase on a Linux-based system, although still limited to hardware setup for a shared-memory workstation as compared to a cluster.
The possibility to run problems with very large number of elements in the computational mesh will enable a more detailed description of the thermo-hydro-mechanical processes occurring at depth. We have provided in this work two examples: one relates to fault reactivation during CO 2 sequestration, and the second one relates to nuclear waste disposal. For the first test case, we were able to simulate reactivation of intersecting faults during injection operations. The example In the second test case, we demonstrate the use of the approach to simulate multiple emplacement tunnels at high details. Albeit simplified, the example shows the evolution of stress and strain in a deep geological nuclear waste repository, including the potential for fault reactivation.
Two main points, somewhat negative, are raised for the sake of completeness: & It is worth to mention that TOUGH-FLAC is based on the use of commercial software. While the running scripts as well as the coupling may be obtained for reproducibility, a user would still need to acquire the independent licenses for both TOUGH3 and FLAC3D. & The current coupled code, however, does not provide yet the improved performances that could be desired for HPC in clusters (distributed-memory systems). Certainly, one drawback with the use of TOUGH3-FLAC3D is the limitation to run exclusively on dedicated workstations (here tested on Windows-based machines). Itasca plans on releasing a MPI version of FLAC3D in the future (personal communication with Itasca, June 2022), and the current version of TOUGH3-FLAC3D will certainly be a strong base for future coupling of MPI versions for HPC.
Funding Open access funding provided by Swiss Federal Institute of Technology Zurich.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.