1 Introduction

In a variety of ways of reducing emission, storage, and utilization of global carbon dioxide (CO2), the CO2 geological storage and utilization (CGSU) has been followed by the interest of scientists all around the world and attracted increasing attention. In recent years, underground storage of CO2 in goaf and tunnels of abandoned mines arouses attention. However, fractures and damages caused by tunnel excavations can seriously affect the surrounding environment and operational safety, which makes it restricted in the CGSU engineering practices in abandoned mines. Therefore, surrounding rock mass fracture characteristics play a significant role in the understanding of pre-injection site characterization and post-injection risk evaluation of fault reactivation. Numerical simulation has been widely used to design and investigate rock fractures in the construction of the underground due to the advantages of high efficiency and low cost. Among the numerous simulation method, the finite element method (FEM) can handle the deformation of materials well, while the discrete element method (DEM) can explicitly reflect the fracture of materials. The combined finite discrete element method (FDEM), proposed by Munjiza et al. (1995), by combining the two methods, and the corresponding open-source programs proposed (Munjiza 2004) (i.e., Y2D). By discretizing the model into finite elements and cohesive elements, FDEM can simulate the process of continuous and discontinuous transformation behavior by simulating material fracture and fragmentation processes (Munjiza et al. 1999), which makes it possible to study the progressive failure processes of rocks and rock-like materials. In recent years, evolutionary versions were presented by others, including Y-Geo (Mahabadi et al. 2012), HOSS (Gao et al. 2018), Irazu (Lisjak et al. 2018), FDEM for thermo-mechanical (Wang et al. 2021), and so on. Till date, FDEM and other hybrid methods have been employed in fracture modeling over a widespread area (Liu et al. 2019; Ma et al. 2022; Munjiza et al. 2020; Wang et al. 2022a, 2020). The approach possesses the ability to explicitly simulate fracture initiation and propagation, making it an effective means to simulate large deformation and progressive fracture of the rock mass.

Reliability and accuracy of the simulation results are of great importance in the cracking process. Reasonable simulation models, precise meshing, and accurate parameter values are essential to the accurate simulation result of FDEM. The implemented Mohr–Coulomb criterion can simulate the fracturing of geomaterial, however, necessitates the specification of more parameters. Multiparameters and indeterminacy are major factors affecting simulated results at the laboratory scale and field scale (Wang et al. 2022b). The FDEM procedure of the Mohr–Coulomb criterion fracture model could potentially have over 10 parameters, including density ρ, Young's modulus E, Poisson's ratio ν, normal/tangential penalty Pn/Pt, fracture penalty Pf, cohesion c, tensile strength ft, friction coefficient tanφi, mode I/II fracture energy \({\text{G}}_{\text{fI}}\)/\({\text{G}}_{\text{fII}}\), some of them cannot be evaluated or determined by laboratory or site tests directly and, therefore, approximated by calibration, in particular, micro-parameters. Moreover, there are other kinds of parameters related to control conditions that exhibit significant influence on the calculation results, including time step (△t), loading–unloading rate (\(\dot{k}\)), viscous damping μ, and so on. Extensive research efforts have been devoted to the investigation of the calibration of the first type parameters, which generally involves a series of iterations and adjustments by comparing simulated results with laboratory results, including uniaxial-triaxial compressive strength (UCS–TCS), Brazil disk split test (BD), and shear tests, which were studied in Refs. (Mahabadi 2012a; Tatone and Grasselli 2015) just like other calibration approaches in Refs. (Exadaktylos and Stavropoulou 2008; Stavropoulou et al. 2012, 2010). Model parameters were discussed in Refs. (Lisjak et al. 2012; Mahabadi 2012a; Mahabadi et al. 2010b) by UCS and BD simulations of different model sizes, reasonable values of time step and loading rate were discussed by comparing them with laboratory test results and theoretical analysis (Feng et al. 2018, 2020a, 2020b). However, unloading rate calibration and comparison with laboratory test results and engineering are seldom reported. On the basis of efficient parallel computing, the model parameters are determined accurately. The short-term damage and fracture of tunnel excavation and surrounding rock are accurately simulated by FDEM considering the influence of the control condition parameters of loading–unloading rate (\(\dot{k}\)) and viscous damping μ, and provides a guarantee for studying the influence of tunnel instability and stability control.

The rest of this paper is organized as follows. Section 2 presents advances in FDEM and the fracture model in FDEM and rock fracture mode, as well as investigating the fracture parameters. In Sect. 3, UCS simulation modeling and fracture pattern are presented to investigate the model parameters including the loading rate \(\dot{k}\), time step \(\Delta \text{t} \), viscous damping μ by comparing simulation results with laboratory results of rock mechanics and theoretical analysis. Section 4 presents FDEM programs for large-scale simulation in underground openings and GPU parallel programming improvement of the engineering practice of FDEM. The theory and the CUDA implementation of the proposed contact detection algorithms were introduced. The process of optimization and improvement of the excavation and unloading method is performed, including the damping parameters, dynamic energy dissipation mechanism of the system, and the calculation time in Sect. 5. Conclusions are given in Sect. 6.

2 Calculation principle and improvement of FDEM

2.1 Calculation principle and parameter value of FDEM

FDEM incorporates the features of the FEM for calculating the deformation of the element and its stress strain. On the basis of the DEM, the discrete elements are further divided into triangular finite elements and quadrilateral joint elements to analyze the deformation and fracture of materials. The deformation and internal forces of the triangular element are solved using the FDEM solved by finite elements, and the joint element is a bonded element without thickness, inserted on the common boundary of the triangle element (see Fig. 1). Determine whether the joint element is broken through certain criteria to simulate the initiation and propagation of material cracks. Meanwhile, FDEM exhibits characteristics of the DEM, which updates the position of the block at the beginning of each time step, which involves element retrieval and contact judgment of discrete systems. Contact search is one of the cores of FDEM, using the no-binary search (NBS) method proposed by Munjiza and Andrews (1998). FDEM is different from the simple proportional to the superposition method usually used by discrete elements, and it adopts a distributed penalty function contact force algorithm considering the shape and boundary of the grid. According to the unbalanced force experienced by the element node, Newton's second law calculates the node acceleration and then follows the new node velocity and displacement.

Fig. 1
figure 1

Continuum discretized triangular elastic elements and quadrangular cohesive elements

In FDEM, a model is discretized into triangular finite elements, and then a zero-thickness four-node cohesive element (4-NCE) with coinciding edges is inserted between them. Fracture initiation and propagation were characterized by the failure of the 4-NCE so that fracture propagates freely within the constraints of the mesh topology. The essential theories of FDEM include elastic deformation of finite elements and fracture, contact judgment between discrete elements, contact action, and friction. The following describes the key principles and parameter values of FDEM.

2.1.1 Fracture model in FDEM

The transition process from continuous to discontinuous is a process of fracture and fragmentation, which achieves by the failure of the cohesive element between the triangular element boundary in FDEM. The material fracture initiation with the alteration, damage, yield, and destruction of the microstructure; as a result, changes in the mechanical properties of the crack tip also occurred. Within FDEM, the concept of linear-nonlinear fracture mechanics (LEFM-NLEFM) is used as a crack opening approach for Mode I while exceeding the material tensile strength at the crack tips. FPZ forms as inelastic behavior cracks propagate. Moreover, analogous to FPZ behavior simulates Mode II behavior in FDEM. Fracture initiation and propagation are evaluated via LEFM procedures when FPZ size is substantially less than the structural characteristic dimension. Based on NLEFM, LEFM modifies appropriately because most areas are still elastic except for a tiny area of plastic deformation at the crack tips. FPZ is represented by 4-NCEs in FDEM. 4-NCE may yield and fracture under conditions of Mode I, Mode II, or mixed Mode I-II depending on local stress and displacement of 4-NCE edges (see Fig. 2).

Fig. 2
figure 2

Fracture model of cohesive elements. a Theoretical model; b Numerical model in FDEM: a Mode I; b Mode II

In computation, 4-NCE also experiences mixed Mode I-II fracturing. The resultant crack displacement is large; however, o and s are below or and sr, respectively. Mixed Mode I-II failure is adopted when opening and slip fracture criteria are both satisfied (Mahabadi et al. 2012), which is subjected to both tensile and shear stresses and is described by an ellipse control equation.

$$ \left( {\frac{{o - o_{p} }}{{o_{r} - o_{p} }}} \right)^{2} + \left( {\frac{{s - s_{p} }}{{s_{r} - s_{p} }}} \right)^{2} = 1 $$
(1)

Equating \({o}_{r}\) and \({s}_{r}\), gives:

$$ \left\{ {\begin{array}{*{20}l} {o_{r} = \max \left( {2o_{p} ,o_{p} + 3G_{I} /f_{t} } \right)} \hfill \\ {s_{r} = \max \left( {2s_{p} ,s_{p} + 3G_{{II}} /f_{s} } \right)} \hfill \\ \end{array} } \right. $$
(2)

The or and sr are determined by ft and GfI, as well as fs and GfII, respectively. GfI and GfII are energy required by cracking, which is equal to the area under constitutive behavior curves for 4-NCEs, which gives:

$$ \left\{ {\begin{array}{*{20}l} {G_{{fI}} {\text{ }} = {\text{ }}\int\limits_{{o_{p} }}^{{o_{r} }} {\sigma (o)do} } \hfill \\ {G_{{fII}} {\text{ }} = {\text{ }}\int\limits_{{s_{p} }}^{{s_{r} }} {\left( {\tau \left( s \right) - f_{r} } \right)do{\text{ }}} } \hfill \\ \end{array} } \right. $$
(3)

2.1.2 Munjiza-NBS method

The NBS method divides the computing medium space into grids with the same size (see Fig. 3), and the grid size is determined based on the outer circle diameter of the largest discrete element. Contact retrieval is performed in two steps: ① Map all discrete elements into the grids: Integrase the centroid coordinates of the discrete elements, and then attribute them to the responding grid based on their integer coordinates.

$$ \left\{ {\begin{array}{*{20}l} {{}^{{\text{int}}}x_{i} = 1 + Int\left( {\frac{{x_{i.} - x_{\min } }}{d}} \right)} \hfill \\ {{}^{{\text{int}}}y_{i} = 1 + Int\left( {\frac{{y_{i.} - y_{\min } }}{d}} \right)} \hfill \\ \end{array} } \right. $$
(4)

where \({}^{{\text{int}}}x_{i}\), \({}^{{\text{int}}}y_{i}\) are the values after the integration of the grid-centric coordinates, respectively; \(x_{i.}\),\(y_{i.}\) are the minimum value of all discrete element centroid coordinates, respectively; \(d\) is the grid diameter.

Fig. 3
figure 3

Element-to-grid mapping and element contact relationships (The adjacent grids of the middle grid are dark, the large circle is the grid, and the black circle is the center)

② Search for discrete elements that may be contacted.

2.1.3 Penalty function

The penalty function method states that two objects that contact each other will invade each other while generating contact forces. The standard penalty function contact function is represented as follows:

$$ U_{c} = \int {_{\Gamma } } \frac{1}{2}p\left( {{\mathbf{r}}_{{\mathbf{i}}} - {\mathbf{r}}_{{\mathbf{c}}} } \right)^{T} \left( {{\mathbf{r}}_{i} - {\mathbf{r}}_{{\mathbf{c}}} } \right)dT $$
(5)

where, \(U_{c}\) is the amount of exposure intrusion; \(\Gamma_{c}\) is the contact element to coincide with the regional boundary; \(p\) is a penalty parameter; \({\mathbf{r}}_{{\mathbf{c}}}\),\({\mathbf{r}}_{{\mathbf{t}}}\) are the contact element coincident area boundary point location vector.

2.1.4 Distributed contact force

FDEM employs the penalty function to solve the interaction force between contact pairs. The general penalty function method calculates the concentration force (see Fig. 4a), only considering the displacement of the intrusive overlaps of the contact pair. This approach is comparatively concise, but it is very different from the actual situation, so it is not precise enough. The distributed contact force used in FDEM requires consideration of the size and shape of the overlapping area boundary between the contact pairs, which is more reasonable.

Fig. 4
figure 4

Contact force: a Distributed contact force; b Centralized contact force

FDEM employs triangular constant strain finite elements, which simplify the contact judgment and the deformation calculation of contact elements. Contact pairs are denoted as the contact element and the target element, respectively, and their overlapping area is denoted as \(S\), and the boundary is denoted as \(\Gamma\). Suppose that a microelement \(dA\) in the superimposed area produces a small force of the target element on the contact element is:

$$ d{\mathbf{f}} = \left[ {grad\varphi_{c} \left( {{\mathbf{P}}_{{\mathbf{c}}} } \right) - grad\varphi_{t} \left( {{\mathbf{P}}_{{\mathbf{t}}} } \right)} \right]dA $$
(6)

where, \(d{\mathbf{f}}\) is the tiny forces generated by the microelement \(dA\); \({\mathbf{P}}_{{\mathbf{c}}}\) is the position of the microelement \(dA\) in the contact element; \({\mathbf{P}}_{{\mathbf{t}}}\) is the position of the microelement \(dA\) in the target element; \(\varphi_{c}\) is the potential of the midpoint of the contact element; \(\varphi_{t}\) is the potential of the midpoint of the target element.

The additional basic theory of FDEM can be directly referred to relevant (Liu et al. 2020; Ma et al. 2022; Munjiza 2004; Wang et al. 2022a) and other references.

2.2 New algorithms for contact determination in GPU parallel FDEM

Serial and parallel programs exhibit significant difference that promotes brand-new contact algorithms for more efficient contact detection approaches presented in GPU parallel FDEM. Moreover, contact determination is the hardest part of computationally. The NBS algorithm proposed by Munjiza determines the edges of the contactor element (C) intersect with the edges of the target element (T) in the original serial FDEM. In order to improve the determination efficiency, a new approach named the “coefficient limit method”, is proposed and implemented in our CUDA-based parallel FDEM.

There are two contact cases with contact pairs of the contact element and the target element. the first case is at least one node of elements of the contact pairs is in the other element (i.e., node D of C is in T, see Fig. 5a), and another is the edges of the contact pairs intersect with each other, however, no nodes are inside each other (see Fig. 5b).

Fig. 5
figure 5

Two cases of contact

For the first case, node D of the contact element is in the target element, thus, overlapping areas are:

$$ {\mathbf{AD}} = x \cdot {\mathbf{AB}} + y \cdot {\text{ }}{\mathbf{AC}} $$
(7)

where, AD, AB, and AC are vectors, respectively; x and y are the coefficients, respectively.

In the second case, node F of the contact element and node B of the target element are both outside each other, thus, overlapping areas are:

$$ \left\{ {\begin{array}{*{20}l} {{\mathbf{AF}} = x \cdot {\mathbf{AB}} + y \cdot {\mathbf{AC}}} \hfill \\ {{\mathbf{DB}} = m \cdot {\mathbf{DE}} + n \cdot {\mathbf{DF}}} \hfill \\ \end{array} } \right. $$
(8)

where, DB, DE, and DF are vectors, respectively; m and n are the coefficients, respectively.

There is a relatively small intersection area of the two elements in one time step, and the nodes are near each other. Therefore, a conservative critical value can be set for y and m, i.e. 0.1. Thus, the sufficient and necessary condition of node D in the target element is represented as follows:

$$ \left\{ {\begin{array}{*{20}l} {0.0 \le x \le 1.0} \hfill \\ {0.0 \le y \le 1.0} \hfill \\ {0.0 \le x + y \le 1.0} \hfill \\ \end{array} } \right. $$
(9)

And the sufficient and necessary condition of the two elements overlapping is represented as follows:

$$ \left\{ {\begin{array}{*{20}l} {1.0 \le x + y \le 1.1} \hfill \\ {0.0 \le y \le 0.1} \hfill \\ {1.0 \le m + n \le 1.1} \hfill \\ {0.0 \le m \le 0.1} \hfill \\ \end{array} } \right. $$
(10)

For identifying the relationship between the remaining nodes of the elements, the “coefficient limit method” is also valid. Because x, y, m, and n are pretty simple calculations, therefore, the actual contact determination is considerably efficient using this method.

3 Model parameters

3.1 Loading rate

To ensure that stress wave oscillation is small enough for the model, loading rate \(\dot{\text{k}}\) should be relatively small. A large \(\dot{\text{k}}\) leads to intensive and unreasonable fracture of the specimen and causes strength overestimated. However, a smaller \(\dot{\text{k}}\) indicates longer computation time. Refs. (Lisjak et al. 2014; Mahabadi 2012b; Mahabadi et al. 2010a; Mahabadi and Grasselli 2009; Mahabadi et al. 2015; Tatone 2014; Tatone and Grasselli 2015) analyzed the effect of \(\dot{\text{k}}\) on simulation results for UCS and BD for a specific size, Table 1 statistically the loading rate of laboratory-scale FDEM simulation, and reveal that the maximum and minimum are 1.0 ms−1 (Mahabadi et al. 2014a) and 0.25 × 10–3 ms−1 (Piovano 2012), respectively, for diverse size models, which is a wide range.

Table 1 Statistical table of simulated sample parameters

The calibration procedure serves to determine FDEM parameters. UCS modeling makes a feasible and optimal approach to calibrating FDEM parameters (Lisjak et al. 2012), comprehensively reflecting rock’s macroscopic response under the combined action of tension and compression. Calibration modeling and boundary conditions are represented as follows: ISRM recommended standard size UCS model with Φ50 × 100 mm is adopted in this study, which reduces impact factors and facilitates comparison of laboratory results. The model evaluates the effect on fractures at different \(\dot{k}\) according to the macroscopic fracture mode and the crack number in the sample, which serves as a foundation for comparing with the final fracture mode. Two-dimensional (2D) UCS modeling is presented in Fig. 6. Irregular meshes are used and mesh size variations below 0.001 m of 0.0006 m. To reflect different loading rates, the top and bottom boundaries of the sample are loaded at different rates of 0.05, 0.1, 0.2, and 0.4.

Fig. 6
figure 6

Model of UCS (Units: m)

The rock fracture mode in samples of standard size with diverse \(\dot{\text{k}}\) is presented in Fig. 7, tensile fractures (Mode I) are highlighted in red, and shear-slipping fractures (Mode II) are highlighted in blue within specimens, and reveal that with the continuous increase in the \(\dot{\text{k}}\), changes in UCS fracture mode occurred and cracks exhibit more excess. When the \(\dot{\text{k}}\) increased to 0.2 ms−1, the fracture mode exhibits significant differences and discrepancy exists between the experimental result observed; however, when it is less than 0.1 ms−1, the fracture mode is normal. Therefore, 0.1 ms−1 meets the requirement for the laboratory-scale simulation.

Fig. 7
figure 7

Fracture modes for rock UCS samples of φ50 mm in different \(\dot{k}\): a 0.05; b 0.1; c 0.2; d 0.4

3.2 Time step

Time step plays an important role in simulation efficiency and stability. The time step controls node velocity and displacement updates and determines calculation steps for simulating the physical process according to fixed total time. Small time steps cause the simulation costly, and large time steps affect computational accuracy or even cause a non-convergence issue.

In DEM, \(\Delta \text{t}\) should satisfy two conditions: ① the propagation distance of the stress wave within a time step should be less than the mesh size, and ② the momentum transferred in a time step cannot exceed total momentum transfer in all collision processes from single-degree-of-freedom spring oscillator system. Equating the corresponding Δt of two conditions, which gives:

$$ \left\{ {\begin{array}{*{20}l} {\Delta t = \frac{h}{{\sqrt {E/\rho } }}} \hfill \\ {\Delta t = \frac{2}{{\sqrt {k_{c} /m} }}} \hfill \\ \end{array} } \right. $$
(11)

Guo (2015) suggested that a stable time step should account for two aspects of the FEM stress calculation and the DEM contact calculation with a smaller time step \(\Delta \text{t}\), which gives:

$$ \left\{ {\begin{array}{*{20}l} {\Delta t_{{FEM}} \dot{ = }\frac{h}{{10}}\sqrt {\rho /E} } \hfill \\ {\Delta t_{{DEM}} \dot{ = }\frac{\pi }{5}\sqrt {m/k_{c} } } \hfill \\ {\Delta t_{{FDEM}} \dot{ = }\min \left\{ {\Delta t_{{FEM}} ,\Delta t_{{DEM}} } \right\}} \hfill \\ \end{array} } \right. $$
(12)

Simulation results reveal that the time step should be far below the calculated \(\Delta \text{t}\) to guarantee computational accuracy. Regarding contact simulation of discrete elements, the process from intact rock to fracture rock should be measured in dozens of time steps. The time step should be relatively small (sometimes up to level \({1}{\text{0}}^{-9}\)) if the mesh size is small, which causes the total simulated time step to reach millions. To ensure calculation accuracy, a sizeable time step should be used whenever possible to save simulation costs.

3.3 Viscous damping μ

FDEM is essentially a kinetic computation tool. Node velocity and displacement in FDEM use the first-order forward difference integral method for calculating acceleration and velocity based on node unbalanced force and node system in motion. The stress wave in the model violently oscillates with a large node velocity when the rock quasi-static fracture process is modeled, resulting in a rock dynamic response far from the actual response. Sometimes, the dynamic response may cause large quantities of macroscopic tensile cracks under macroscopic compression. To simulate the quasi-static process via a dynamic method, the dynamic effect should be minimized to dissipate the stress wave as soon as possible and keep the model velocity at a low-level during loading. Applying μ in triangular elements is used to dissipate energy in FDEM. A damping force is then calculated according to μ and the deformation rate tensor of the triangular elements. Munjiza and John (2002) presented an approach for calculating critical viscous damping, which gives:

$$ \Delta \mu \approx 2h\sqrt {E\rho } $$
(13)

The μ is related to the time step and affects the total time step. A large μ indicates that the time step must increase to stabilize the calculation. When μ is far less than \(\Delta \mu \), more computation time is needed to sustain the system in equilibrium. The μ may affect fracture mode. When simulating rock fracture, μ and time step should be comprehensively considered to improve calculation efficiency and assure computational accuracy. A value just less than \(\Delta \mu \) is advisable for a general simulation.

4 GPU parallelized computing and excavation simulation method

4.1 GPU parallelized computing of FDEM

Low computational efficiency poses challenges to the accuracy of simulation results. Large-scale research (such as deep tunnel excavation) with FDEM involves extensive contact searches. Improving computational efficiency is key to calculating the time-consuming contribution. Extensive research has proposed CPU parallel and GPU parallel solutions with differently based approaches to speed up computational efficiency. CPU parallel computing mainly includes Message Passing Interface (MPI), Virtual Parallel Machine (VPM), Open Multiprocessing (OpenMP), and GPU parallel computing based on Compute Unified Device Architecture (CUDA) and Open Computing Language (Open CL) (Fukuda et al. 2019; Lisjak et al. 2018; Liu et al. 2020; Munjiza et al. 2012), Because of the explicit solution FDEM is, the computation takes up little memory, and the program is liable to parallelize. Parallel computation is an effective solution, which makes dramatic improvements in the calculation efficiency and significant contributions to the ability of FDEM. Thus, large-scale engineering problems can be solved.

A GPU parallelized FDEM code was presented using NVIDIA corporation’s CUDA technology based on the original Y-code (Liu et al. 2020). GPU parallelized FDEM has been made hundreds of times faster by exploiting massive fine-grained parallelism with GPUs plugged into a single PC. The speedup ratio mainly depends on the GPU hardware, the number of elements in the model, and the fracture degree of the inhomogeneous discrete element size. Moreover, Mohr–Coulomb failure criterion、quasi-static friction law, absorbing boundary conditions, mass proportional damping, in-situ stress initialization, excavation simulation method, and material heterogeneous models were implemented in the CUDA version FDEM to apply in rock mechanics and engineering modeling. Besides, significant difference between serial and parallel programs promotes brand-new contact algorithms for more efficient contact detection approaches that were presented in GPU parallel FDEM. The new contact algorithms are described in detail to emphasize the novelties. For other detailed descriptions of the CUDA parallel program FDEM please refer to “Parallelized Combined FDEM Procedure Using Multi-GPU with CUDA” (Liu et al. 2020).

4.1.1 Improved contact calculation law of CUDA

FDEM simulation of large-scale problems involves a large number of contact searches, which require efficient and robust contact detection algorithms. The contact search consists of the following steps in the original Y-code: (1) Divide the computational domain into identical square grids based on the largest mesh size; (2) Map triangular elements onto the grids based on their current centroid coordinates; (3) Determine the potential contact couples from the centroid distance and representative radius of the two elements; and (4) Determine the actual contacts by the three edges of one triangular element intersect with the other, and calculate the contact potential and contact forces if the factual contacts occur. The contact search has drawbacks in the computational efficiency of parallel computing: the two determinations in step 3 and step 4 lead to a waste of execute thread, moreover, for potential couples that are contactless, there is no need to execute the calculation of contact potential and contact forces. Thus, with each thread responsible for actual contacts, step 3 and step 4 can be merged into one kernel function to determine the actual contact directly in one step. Therefore, potential actual contact is determined by one kernel function (see kernel 3 in Fig. 8). The next kernel function (see Kernel 4 in Fig. 8) is in charge of calculating the contact potential and contact forces for actual contact. The load imbalance problem is avoided in such a manner.

Fig. 8
figure 8

Improved contact calculation flows of CUDA (In principle, one block conducts one grid to lessen data transmission, and one thread conducts one element, elements searched by thread execute synchronously in the grid.)

4.1.2 Contact search method for uneven mesh sizes

The grid size depends on the size of the largest discrete element in the original serial FDEM, a grid may contain several elements. When the element sizes are uneven, the elements belonging to one grid might be quite large, and the time spent for the contact determination significantly increases, thus, a severe load imbalance might arise among different threads. For instance, for element X in grid A (see Fig. 9a), contact detection is made between X and the elements in the five dark grids. If there are too many elements in grid A, the workload of grid A is the factorials of the number of elements in the potential contact determination step, and the search efficiency significantly decreases. The computational cost is unacceptable under this condition.

Fig. 9
figure 9

Contact search method: a Sequential method; b Improved method for uneven sizes

Unlike the original serial FDEM code, a distinct computational dividing domain mapping approach is adopted in the improved contact search algorithm for the facility of efficient contact search of uneven mesh sizes in large-scale simulation. In the improved contact search method, the grid size is determined by the smallest mesh size (see Fig. 9b) instead of the largest element size to prevent a large number of elements in a grid. Then, all elements are mapped onto the grids based on their coordinates. The fact that the grid size is only a little larger than the smallest mesh size is attributed to avoiding a grid containing many elements, however, the number of grids is quite large. One grid may contain zero elements (i.e., grid (0, 0)), one element (i.e., grid (7, 4) contains element B), or more than one element (i.e., grid (1,1) contains elements C and D and grid (4, 3) contains elements B, C, and D). The contact search is conducted with one grid by one block. Grids that do not contain an element or only contain one element, there is no need to execute contact search; while for grids containing not just one element, the potential contact couples are determined, and the actual contact is determined by examining whether one element of a potential contact couple actually contacts another element. In CUDA programming, one block server delegates to one grid in order to reduce data transfer, to better exploit its capabilities, and to optimize the overall performance; one thread executes one element, and elements searched by the thread execute synchronously in the grid. With a large number of blocks and threads in it that could be employed by CUDA in GPU, the contact search of one element could be assigned to one thread. Thus, the load for one thread is not heavy, and the block can increase and decrease the number of used threads depending on load. Consequently, contact search is significantly accelerated by multi-threaded computing and fine-grained parallel.

4.2 Excavation simulation method in 2D FDEM

Underground excavation simulation can be divided into three steps: ground stress initialization, excavation simulation, and support. The in-situ stress initialization method and 2D excavation method implemented in GPU parallel FDEM are described.

After tunnel excavation of the engineering, the stress near the working face exhibits a 3D distribution. With the continuous advance of the tunnel face, stress redistribution around the opening section occurs and affects a specific zone of several times the tunnel diameter. Moreover, supportive stress of the rock mass in the opening section decreases from in-situ stress to zero in a far area behind the tunnel face during this process. Stress at a certain distance ahead of the working face is equal to the in-suit stress and is undisturbed. Rock mass at the working face produces support to the surrounding rock within no zero range ahead and rear, and the support force is approximately 25% of the original in-suit stress (Vlachopoulos and Diederichs 2009) so that the surrounding rock excavated can maintain short-term stability and provide for temporary support time of the working face. With the working face opening, advanced rock displacement and stress redistribution occurred. Rock mass support forces behind the working face decrease continuously. That is, with the continuous advance of the working face, the rock mass changes continuously from the far front of the working face (approximately 2.5 times the tunnel diameter) to the far rear section of the working face (approximately 4.5 times the tunnel diameter).

A 2D plane strain model is used to simulate the three-dimensional (3D) tunnel excavation, the support effect of the working face should be considered. It means that the 2D model is used to simulate 3D problems, so need to consider the support force of the rock structure in the working face. Displacement in a certain range in front of the working face was observed and stress redistribution also occurred. Surrounding rock within a certain range behind it is still experiencing continuous deformation after the excavation of the working face, and stress is continuously redistributed. The supporting effect is presented by gradually softening the strength of elements in the excavation zone (H et al. 2003). An “excavation zone softening technique” was applied in 2D FDEM, that is, the 3D opening process is regarded as the opening zone’s Young's modulus (E) of the triangular element reduction with this approach. In order to improve precision and maintain a quasi-static loading rate, a linear or exponential function should be employed for unloading in Young's modulus reduction process of the opening zone in the excavation step, and the subsequent does not proceed until the system kinetic energy dissipation attains a very small critical threshold.

5 Modeling verification and optimization

5.1 Verification modeling and parameter

A circular water delivery hole tunnel is considered as an example to illustrate the effectiveness and practicability of the model parameters and boundary conditions as well as the excavation simulation method in the GPU parallel FDEM underground excavation simulation.

The excavation model is presented in Fig. 10, to simulate the effect of in-situ stress, excavation, and fracture of surrounding rock at different time steps, as well as the final EDZ. The dimensions of the simulation model are 80 × 80 m2 with a 5-m diameter tunnel in the center of the model, tunnel diameter based on engineering design and the size of the model are considered the effect of boundary effect on post-excavation failure characteristics and calculation efficiency. MRZ around the excavation boundary was employed to better simulate the fracture response near the excavated zone. The average mesh size of the MRZ was 0.05 m, and the triangular elements and cohesive elements are approximately 800,000 and 1,200,000, respectively, the number of quadrilateral and triangular elements in the model amount to approximately 1 million. The model is isotropic strength, and the tested results of mechanical properties in Table 2 are directly used as triangular element parameters. Other parameters are selected based on the test results.

Fig. 10
figure 10

Excavation model

Table 2 Mechanical properties of the malmstone

5.2 Boundary conditions and in-situ stress initialization method

In-situ stress plays an important role in rock mechanics and its initialization is a necessary condition for simulating excavation. Calculation of the in-situ stress field is of great practical significance in the exploration of rock engineering applications such as underground excavation and slope stability. A predetermined in-situ stress should be set in the model before excavation. Based on engineering simulation, a reasonable ground stress initialization program is added to the GPU parallel FDEM program. The in-situ stress initialization is implemented in the following steps. First, convert the in-situ stress into the nodal forces of finite elements and added through stress boundary conditions. Nodal forces were kept constant until the kinetic energy dissipation of the system attains a stabilization value. Second, fixed the outer boundary of the model and kinetic energy dissipation occurs again attributed to the fact that the nodal forces released. When the kinetic energy dissipation of the system attains a stabilization value, the in-situ stress initialization is accomplished. Notably, if large enough fracture penalty values are used, attained stress is almost the same as the preset stress.

The in-suit stress of the excavation model is initialized as the above method and represented as follows: horizontal direction is the larger principal stress and the vertical direction is small principal stress and maintains constant with depth. Horizontal stress is 22 MPa and vertical stress is 26.5MP, with the lateral pressure coefficient λ = 1.2.

5.3 Optimization and improvement

For systematic explorations of the influence of the effect of loading–unloading rate with time step, the reduction rate of the opening zone’s Young's modulus of the excavation simulation method, the main factors on fracture mode and failure mechanism of the underground tunnels excavation rock mass, the continuous improvement of the simulation results of deep surrounding rock tunnel excavation were performed by continuing to optimization of the original FDEM program and model input parameters.

The following figures are the dimensions of the surrounding rock in the immediate vicinity of the simulated excavated tunnel. A linear function reduction rate of the opening zone’s Young's modulus is used for unloading.

5.3.1 Original fracture characteristics of tunnel

The original fracture characteristics of tunnel simulation results of the fracture process and EDZ are presented in Fig. 11, where black lines indicate shear fractures. Characteristics of simulation result are: The fractures are blurred, and mainly exhibit fracture zone of a certain thickness along the main crack; The in-situ stress is symmetrical, while the crack is asymmetrical; Some cracks propagate far independently, and the cracks are irregular; The fracture perpendicular to \(\sigma_{1}\) is under expansion, while some solitary cracks parallel to \(\sigma_{1}\) penetrate deep into the surrounding rock \(\sigma_{1}\).

Fig. 11
figure 11

Original fracture characteristics of tunnel

The differences in the appearance of the simulation effect are mainly caused by the factors and behaviors of the unloading rate (\(\dot{k}\)). Unloading rate unreasonable results in large vibration of the model. Therefore, the fracture is greatly affected by the mesh shape. To make the simulation reasonable and accurate, the stress variation should be taken into account, and the excavation zone can be gradually softened for simulation. In program implementation, the approach of decreasing Young's modulus of the excavated zone decreases continuously with the time step until it approaches zero.

The aforementioned results reveal that with an unreasonable loading–unloading rate by the time step, the fracture is greatly affected by the mesh shape, and unreasonable fractures of surrounding rock would occur. To evaluate the role of the unloading rate, a comparatively small reduction rate of the opening zone’s Young's modulus with time step, time step, and viscous damping μ set to a reasonable value based on the aforementioned. The improved fracture characteristics of tunnel simulation results of the fracture process and EDZ are presented in Fig. 12. Characteristics of simulation result are: The cracks are slightly clearer, but there are still "jointed" fragmentation areas; The in-situ stress is symmetrical, while the crack is less symmetrical; Some cracks propagate independently and tend to directional \(\frac{\pi }{4} + \frac{\varphi }{2};\) The fracture zone perpendicular to the direction of \(\sigma_{1}\) being underpropagated, while some solitary cracks parallel to the direction of \(\sigma_{1}\) penetrating deep into the surrounding rock.

Fig. 12
figure 12

Improved fracture characteristics of tunnel

The improved fracture characteristics of tunnel simulation results of the fracture process and EDZ are presented in Fig. 13. Characteristics of simulation result are: the cracks are fuzziness; the in-situ stress is symmetrical, and the cracks tend to be symmetrical; two sets of cross-cracks formed, and they tended to be directional \(\frac{\pi }{4} + \frac{\varphi }{2};\), and the crack propagation area was reasonable. However, large tracts of intact rock are interspersed between the cracks, and the cracks are more random and messy.

Fig. 13
figure 13

Further improved fracture characteristics of tunnel

The further improved fracture characteristics of tunnel simulation results of the fracture process and EDZ are presented in Fig. 14. Characteristics of simulation result are: the crack is more symmetrical, which is consistent with the direction of the in-suit stress, is more reasonable; The two groups of cross cracks were more pronounced; The EDZ in both directions is roughly 1.7–2.4. However, the fracture is also fuzziness.

Fig. 14
figure 14

Further optimization fracture characteristics of tunnel

The differences in the appearance of the simulation effect are mainly affected by other parameters, such as fracture parameters of tensile strength ft, cohesion c, friction coefficient tanφi, mode I/II fracture energy \({\text{G}}_{\text{fI}}\)/\({\text{G}}_{\text{fII}}\), and micro-parameters of normal/tangential penalty Pn/Pt, fracture penalty Pf.

Then, the calibration procedure serves to determine FDEM parameters using UCS modeling. Moreover, a comparatively smaller loading rate of 0.01 ms−1 is adopted to further investigate the loading rate effect on sample fractures. After calculating and calibrating, the parameters listed in Table 3 make simulation results effectively align with laboratory results as presented in Fig. 15a, where blue lines indicate shear fractures and red lines indicate tensile fractures. The result reveals an obvious principal shear crack zone in the UCS model, and the shear angle is approximately 58° = ( 45° + \(\frac{{\varphi}_{\text{i}}}{2}\)). Figure 15 reveals that simulated results and stress–strain curves are consistent with the laboratory results. The comparison results from laboratory results (see Fig. 15) clearly illustrated that the fracture mode is a reasonable case of damage fracture characteristics.

Table 3 FDEM input parameters of excavation models
Fig. 15
figure 15

Numerical fracture progression and experimental failure results: a Numerical fracture progression of UCS; b Experimental failure

Therefore, a smaller \(\dot{\text{k}}\) ensures a quasi-static loading process; thus, it should be set as small as possible and generally, and the maximum should be below 0.1 ms−1 for simulation.

The excavation simulation method was elaborate considering the quasi-static loading rate to improve precision. Based on the aforementioned analysis, a linear function was used for unloading and Young's modulus reduction is made with 10 times steps from the intact rock modulus to its 0.0001 times, and the reduced quantity of each step is equal. With the fact that a quasi-static problem, the subsequent reduction does not proceed until the kinetic energy dissipation of the system minimizes to a comparatively small critical threshold. The input parameters used in the simulation are shown in Table 3.

The improved fracture characteristics of a tunnel simulation result of the fracture process and EDZ have presented in Fig. 16 a. Characteristics of simulation result are: the fractures exhibit clear, symmetrical, and denser; the direction of fracture propagation tends to \(\frac{\pi }{4} \pm \frac{\varphi }{2}\), and exhibit two sets of cross-cracks. It was phenomenally judged that the fracture zone was consistent with the magnitude of the two main stresses. The final fracture characteristics of the tunnel are presented in Fig. 16 b.

Fig. 16
figure 16

Fracture characteristics of tunnel: a of preliminary simulation results (calculations are only 30% complete); b Final fracture characteristics

6 Conclusions

To systematically investigate the sensitivity of model parameters, including the loading rate \(\dot{k}\), time step, and viscous damping μ with time steps, in this study, we conducted different loading and unloading rate specimen test both in lab-scale (uniaxial compression tests) and field-scale (circle tunnel excavations) using FDEM to simulate rock mechanics and rock fracture mode to investigate the loading–unloading rate sensitivity. Loading rate sensitivity is conducted by comparing the simulated result and lab result of UCS tests with the ISRM standard size model of Φ50 × 100 mm, and unloading rate sensitivity is conducted by the process of optimization of the unloading rate threshold of system kinetic energy in the opening zone’s Young's modulus softening process of excavation simulation method of engineering application in 2D FDEM. The following conclusions are obtained:

  1. (1)

    In the laboratory-scale simulation, with the continuous increase in the loading rate, changes in the UCS fracture mode occur and cracks exhibit more excess; moreover, when the \(\dot{\text{k}}\) increase to 0.2 ms−1, the fracture mode exhibits significant differences and discrepancy exist between the experimental result. However, when it is less than 0.1 ms−1, the fracture mode and characteristics exhibit normally. Thus, a smaller \(\dot{\text{k}}\) ensures a quasi-static loading process and should be below 0.1 ms−1 for the laboratory-scale simulation. After calculating and calibrating, the 0.01 ms−1 makes simulation results effectively align with laboratory results.

  2. (2)

    In order to improve precision and maintain a quasi-static loading rate, a linear or exponential function should be employed for unloading in the opening zone reduction process of the excavation step, and the system kinetic energy should be set at a very small critical threshold. The fractures exhibition and the direction of fracture propagation reveal that the small value unloading rate of system kinetic energy dissipation is attributed to a stable and uniform propagation of the cracks, avoiding fuzziness and unsymmetrical and too-large fracture proportion in a time step.

  3. (3)

    The optimized and improved process makes the 2D excavation method of unloading rate and the time step rationalized, the values of strength and damping parameters more clearly, the dynamic energy dissipation mechanism of the system more perfect, moreover, the calculation time is significantly reduced. The kinetic energy of the system is maintained at a small level throughout the calculation process.