Speed-up Measures and Associated Problems for Large-scale DEM-Models

This contribution discusses the possibilities to increase the efficiency of large Discrete Element Method (DEM) simulations. Simulations were conducted to test particle upscaling, decreasing shear modulus and using GPU instead of CPU for the computation. The conducted simulations modelled a simple ore extraction process from a defined outlet of a stope in a cave mine. The analysis is based on the influence of the approaches on the computation speed-up and the accuracy of the result. It was found that the real shear modulus could be decreased in the simulation by a factor of 103 without interfering strongly with the result, provided that the decreased shear modulus is not smaller than 108 Pa. However, a reduction of 102 was found to bring the highest speed-up to the present application. For the particle upscaling in this contribution without parameter calibration, the upscaling factor should not exceed 3. Higher upscaling factors affect the results significantly. However, the flow dynamics was already differing for an upscaling factor of 2, even though the flow zones were comparable. Using a GPU instead of a CPU is only recommended if the simulation contains a high number of particles. The speed-up for a simulation with almost 250,000 particles was over 8, but the advantage diminished for less particles. Furthermore, differences in the results between CPU and GPU computation could be found, which could be a starting point for future work. In summary, this research significantly aids in the development of more efficient DEM simulations for large-scale applications, such as cave mining.


Introduction
When doing large DEM simulations with millions of particles, either the computation time is usually very high or it is not feasible to simulate the models at all. The increase in computational power, both hard-and software, is helpful, but large DEM simulations still take a considerable amount of time. Three other ways to reduce the computation time are to decrease the shear modulus in the model, to reduce the number of particles by upscaling their size, and to use a graphics card (GPU) instead of an ordinary processor (CPU) to conduct the computation. The general target is to maintain realistic simulations even though the model is simplified in one way or another [1].
The first method of decreasing the shear modulus leads to a reduction of the particle stiffness. A shear modulus lower than the original permits larger particle overlaps in the simulation without creating excessive forces. Therefore, larger time steps can be used, resulting in fewer iterations and a lower computation time. It was seen in literature that the shear modulus could be reduced by a factor of 10 3 without affecting the DEM results strongly. However, the shear modulus should always be kept above 10 8 Pa [2]. The second approach of particle upscaling refers to the practice of replacing the original particles with larger particles. This results in a lower number of particles in the model [3]. This method needs to be used carefully as a too large upscaling factor can affect the results strongly and can no longer model the flow process realistically. This is especially the case when the process is controlled by the particle size. It was found in literature that the maximum upscaling factor is dependent on the application and values of up to 4 are possible without falsifying the results [4].
The third method to increase the speed of DEM simulations is to use a GPU instead of CPU. This approach is especially useful when computing large numbers of particles as a GPU can perform more calculations at the same time than a CPU. Depending on the model and the utilized hardware, speed-up factors of up to 35 could be achieved in the past [5]. A precondition to use GPU calculation is that the software facilitates this massive parallel computing. As a result, software companies are putting effort in developing GPU solver engines for their simulation programs. An example is Altair with the software EDEM, which was used for the simulations in this contribution [6]. The models testing the ways of speeding up DEM simulations mentioned above are discussed in the next section. The focus was on the limits and the effect on the accuracy.

Experimental Setup
To investigate the most promising way to speed up computation times, a relatively simple numerical experiment has been set up. The analysis focuses on the effect of particle upscaling, reducing the shear modulus and GPU computation on the solution time and the plausibility of the results. To enable a clean comparison of the different approaches mentioned above, a basic model is established and simulated in the beginning. In the subsequent simulations, only one parameter is changed at a time, e.g. the particle diameter or the shear modulus. This approach facilitates finding the maximum speed-up factor for each method and tests the results from the research performed in the chapters above. The properties of the ore and the equipment, summarized in Table 1, are equivalent to the properties calibrated by Koch [7] for iron ore particles of the mine in Kiruna, Sweden. It should be noted here that the shape of the particles in this contribution are different to the originally calibrated material and the behavior therefore might deviate to some extent. For an easier comparison, all simulations use spheres as particles to minimize the impact of the particles' shape.
The motivation behind this research is the development of the novel mining method called raise caving. In the joint work with LKAB Kiruna, particle flow was identified as a critical point for cave mining methods [8,9]. The DEM model replicates the material flow in the stope of a cave mine when material is extracted. Thereby, a stope is filled with broken material and the extraction is performed from the drawpoint at the bottom of the structure. The dimensions of the cuboid stope are 10 × 10 × 20 m and the outlet drift at the bottom possesses a cross-section of 5 × 5 m (see Fig. 1). The continuous extraction of the material is conducted 1.5 m outside the brow of the draw-point, which is also the length  of the attached drift. The CAD file of the stope is imported to EDEM as a geometry with the same interaction parameters with the particles as for the particles among themselves.
The simulations used 16 cores of an Intel® Core™ i7-5960X [10] Processor with 3.00 GHz. For a better understanding of the results, a distinction must be made between model time and computation time. The model time refers to the real duration which is represented in the model. This is used to compare the flow dynamics, meaning the mass flowing out of the stope in a given amount of model time.
In contrast to this, the computation time refers to the time it takes to run the simulation. This is the significant time measured and compared to find the speed-up ratios. The speedup value is determined as computation time of the original model divided by the computation time of the model with the applied speed-up measure.
The so-called Rayleigh time step can be calculated automatically by the software and serves as the basis for the calculation time step [11]. The Rayleigh time step is influenced by the particle radius and the shear modulus among other factors. The calculation time step can be given as a specific fraction of the Rayleigh time step and usually ranges around 20% or below [12].
To compare the quality of the results, the flow zones inside the stope of each model is analyzed after an extracted mass of 1 kt (see Fig. 2). The non-extracted particles are colored according to the traveled distance towards the drawpoint at the time of plotting the image. The black particles describe the extraction zone, which is the original area of the extracted material. This was the reference area whose width, height, and depth are compared between the models. This was done to see if the results of the simulations are still comparable to the original and therefore the speedup measures are valid and reliable. Furthermore, the flow dynamics is compared by the discharge rate, meaning the model time it took to extract 1 kt of material.

Decreasing Shear Modulus
The original shear modulus of the investigated material is 3.7e + 10 Pa. In five different models, this value was reduced  to 3.7e + 06 Pa. The particle diameter was set to 0.2 m. For the models, 15% of the Rayleigh time step was chosen for the computation. However, the models with a very low shear modulus contained a higher number of particles due to a larger overlap. Thus, they did not stabilize even without external interference, which forced a smaller time step for these models. This and the increased number of particles nullified the effect of the shear modulus decrease. The simulation results including the used time steps, speedup factors, and flow dimensions can be seen in Table 2. The achieved speed-up for every shear modulus is shown graphically in Fig. 3. The deviation of the results from the original model is shown in Fig. 4.
The results summarized in Table 2 and Fig. 4 highlight the problems with the low shear modulus, see the column on the right-hand side of Table 2. It needs to be mentioned that, especially for the model with 3.7e + 06, the flow situation is not comparable anymore due to an excessive overlap of the particles and hence a much higher bulk density. This means that the reasons for the problems mentioned require further investigation.
The simulations demonstrated that reducing the shear modulus can accelerate the computation time. The achieved speed-up factors are significant and reach up to 10. A reduction of more than 10 2 could not bring further improvements in the present application. This was mainly due to the weight acting from the particle pile on the lower particles. This led to excessive overlaps of the particles and demanded a decrease of the time step for a small shear modulus.

Particle Upscaling
Using the findings from the previous section, a shear modulus of 3.7e + 08 is used in the following simulations. The particle diameter was 0.2 m for the basic model and was multiplied with upscaling factors of 2, 3, 4, and 5 for the subsequent simulations. All the simulations are done using a time step which is 10% of the Rayleigh time step. The simulation parameters and results of the simulations can be found in Table 3. It can be seen that the flow dynamics changes as the particles are upscaled. The larger the particles, the longer it takes for 1 kt of the material to flow out of the outlet. In the meantime, the flow zone dimensions are comparable for different upscaling factors after 1 kt of extracted material. This effect has a geometrical background. The opening size is the same for all models, but the particles get larger and hinder each other from flowing out. For applications where the flow dynamics are important, the upscaling approach is therefore inadequate. If only the flow zones are of interest, this approach might still be valid. This indicates that particle upscaling should be evaluated carefully, and a subsequent parameter calibration can be beneficial.
As the reference value for comparison was the extracted mass of 1 kt and the larger particles had a slower outflow, the upscaled particle simulations had to be run for a longer

GPU Calculation
The same upscaling simulations as before were also simulated with a GPU to see the effect on the computation time.
The used GPU was the AMD Radeon VII [13]. The results of the GPU simulations can be seen in Table 4. The comparison of the speed-up factors for the particle upscaling simulations on GPU can be seen in Fig. 7. The resulting accuracy is shown in Fig. 8. The upscaling factor of 2 resulted in a speed-up factor of almost 2 compared to the original simulation. Like in the CPU calculations, a disturbed flow dynamics was observable. This means that 1 kt of mass needed more time for larger particles to flow out than for smaller ones. Hence, no other upscaling factor could achieve a faster computation than the original model. A second reason for the decrease in performance is inherent to the GPU, which holds advantages mainly for a large number of particles. As there are fewer particles included in the model with upscaled particles, the advantage of the fast GPU vanishes.
Comparing the GPU speed to the one of the CPU, the advantage is also mainly present for the original model with a high number of particles. This can be seen in Fig. 9, in which the speed-up of GPU over CPU is shown. Higher upscaling factors of 3 and 4 with fewer particles involved in the model show a superiority of the CPU. That underlines the research, which shows that cost intensive GPUs are only effective for a high number of particles. At an upscaling factor of 3, the simulation time is even increasing for the GPU, making the particle upscaling method inadequate in this application.
The upscaling factor of 5 could not be computed until the 1 kt extraction because a hang-up occurred, which was not the case for the CPU calculation. It becomes apparent that the GPU calculation gives slightly different results to the ones of the CPU. There are different reasons why CPU and GPU simulations might deliver slightly different results despite starting from the identical time step. When using the OpenCL solver, no simulation data is transferred between the GPU and the CPU. Therefore, the time steps that require a data save will be run fully on the CPU. The combination of  CPU and GPU may account for the observed differences in the results [14]. Additionally GPUs produce approximation and round-off errors, also called truncation errors, which may lead to less accurate results [15]. Generally, this finding of different results between CPU and GPU calculation should be a point for future investigations.

Summary and Discussion
The objective of this contribution is to improve the understanding of speed-up measures for large DEM models using the example of a cave mining simulation. The models used in this contribution recreated the extraction process from the bottom of a filled stope. Using simplified, spherical particles helped to ensure the comparability of the different simulations. Without a proper calibration, the results of flow zones should be seen qualitatively as the focus was rather on the speed-up. Three methods were tested, namely particle upscaling, decreasing shear modulus, and using GPU instead of CPU for computation.
It was found in past DEM research activities that using the correct shear modulus is not significant for realistic DEM results. However, a smaller shear modulus can decrease the computation time as it allows larger time steps. Literature however, showed that the results are highly questionable when the shear modulus of rocks becomes lower than 10 8 Pa. This was also supported by the simulations in this contribution. A reduction of the shear modulus larger than by the factor of 10 2 could not bring any improvement in computation time for the present application. This was due to the weight pressure of the particle pile over the lowest portion of particles. This led to high overlaps in the model of low shear modulus and demanded a lower timestep. Also, the bulk density changed strongly due to the overlapping particles when the shear modulus became too small.
The particle upscaling simulations showed that it is only reliable up to a certain point and requires parameter calibration beyond that. Upscaling factors of up to 3 delivered similar flow zone results as the original model. However, flow dynamics differed, and the outflow was slower for larger particles. For applications where the flow dynamics is of interest, the particle upscaling method is therefore inadequate, at least without subsequent parameter calibration. For the present application where just the flow zones were investigated, an upscaling factor of 2 was shown to be optimal. Further upscaling reduced the speed again due to longer outflow times.
Using a GPU instead of a CPU for the upscaling simulations was seen to only be advantageous with a high number of particles. While a simulation with almost 250,000 particles resulted in a speed-up of over 8, the advantage diminished for less particles. For smaller simulations with 10,000 particles or fewer the CPU showed a superiority. Furthermore, slight differences in results could be observed between CPU and GPU calculation, which should be a point of future investigation.
Overall, the contribution provides insights into the benefits and limits of different methods to improve the efficiency of large DEM models using the example of a cave mining stope. The contribution highlights the direction of speedup methods and simulations and shows the potential for future research to optimize processes without sacrificing accuracy.

Funding. Open access funding provided by Montanuniversität Leoben.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.