Use of machine learning for unraveling hidden correlations between particle size distributions and the mechanical behavior of granular materials

A data-driven framework was used to predict the macroscopic mechanical behavior of dense packings of polydisperse granular materials. The discrete element method, DEM, was used to generate 92,378 sphere packings that covered many different kinds of particle size distributions, PSD, lying within 2 particle sizes. These packings were subjected to triaxial compression and the corresponding stress–strain curves were fitted to Duncan–Chang hyperbolic models. An artificial neural network (NN) scheme was able to anticipate the value of the model parameters for all these PSDs, with an accuracy similar to the precision of the experiment and even when the NN was trained with a few hundred DEM simulations. The estimations were indeed more accurate than those given by multiple linear regressions (MLR) between the model parameters and common geotechnical and statistical descriptors derived from the PSD. This was achieved in spite of the presence of noise in the training data. Although the results of this massive simulation are limited to specific systems, ways of packing and testing conditions, the NN revealed the existence of hidden correlations between PSD of the macroscopic mechanical behavior.


Introduction
The specific values of properties such as strength, compressibility and permeability of dry and cohesionless coarse grain materials (including sand, gravel, railway ballast or rockfill) depend on the features of the constituent particles (intrinsic properties) and on the way in which the particles are arranged (state parameters). Among the intrinsic properties of a sand, the surface friction, the compressibility and the strength of individual grains, the particle shape and particle size distributions are known to play a crucial role in its macroscopic properties [1,2,3,4]. Relative density and confining pressure are the most influent state variables for dry granular soils [5] and govern the mechanical behavior of the material to a large extent [6,7,8].
The relationship between the particle size distribution, PSD, and the mechanical behavior is not yet fully understood. On one hand, the effects of variations in the PSD are not independent from those produced by variations of other intrinsic properties or state parameters. For example, the state parameter ψ, proposed within the theoretical framework of the critical state of sands [5], helps to distinguish between the contractive or dilatant behavior exhibited by a sand upon triaxial compression. However the critical state line, and hence the value of ψ associated to given void ratio e, changes with the PSD [9]. As another example, there is a complex interplay between size and shape polydispersity, as shown by numerical modeling [10]. On the other hand, linking single quantities (maximum and minimum dry density, critical state void ratio, macroscopic friction angle, stiffness, etc.) to a PSD is not immediate, since the latter is a highly variable curve that is many times long-tailed and/or multi-modal. Descriptors derived from the PSD are not enough to anticipate macroscopic (void ratio, stiffness, friction angle) or microscopic features (average coordination number, fraction of non-contributing particles, etc.) obtained after a given process. Neither geotechnical descriptors, such as the D xx (i.e., the sieve size passed by xx percent in weight of the sample), the coefficient of curvature C c or the uniformity coefficient C u , nor statistical descriptors (mean, variance, skewness, kurtosis, etc.) enable satisfying estimations. There is not clear procedure to work directly with the whole PSD curve. Even in the case of very idealized systems (e.g., packings of spheres) variations of the PSD may lead to considerably differences in the fabric resulting after a packing protocol [11,12], in the relative density [4] or in the shear strength [13]. In the case of non-idealized systems this can be even worse, as several kinds of physico-chemical phenomena occur on different length and time scales. Relationships between geotechnical descriptors obtained from the PSD and geotechnical properties have been sought (e.g., [14,15,16]), but findings are always empirical and limited to a specific set of soils and stress paths.
The use of large datasets enables promising techniques to understand how the complex behavior of granular systems can be anticipated from the microscopic features. For example, the use machine learning techniques, together with complex network theory, has allowed for the establishment of relationships between the fabric of a packing and some macroscopic geotechnical properties, such as the permeability [17,18] or the effective heat transfer coefficient [19]. These techniques were also applied in [20] for obtaining morphological information of granular materials, including their particle size distribution, from X-ray computed tomography images. The use of artificial neural networks, or just Neural Networks (NN), has been proposed as a potentially useful technique to model materials behavior [21,22]. In the case of geotechnical applications, NNs have been used for unsaturated soils (to predict the shear strength [23], to model their mechanical behavior [24] and to determine the effective stress parameter [25]), for fine-grain soils (to predict the compression index from granulometry and index properties [26]), for rocks (to predict the uniaxial compressive strength and the elastic modulus [27]) and for coarse-grain soils -sands and gravels-to model the mechanical behavior [28,29,30]. The inputs for these NN approaches included both intrinsic properties and state parameters. In some of these cases the target outputs were directly some model parameters (namely, the compression index [26], the apparent cohesion [23], the effective stress parameter [25] or the elastic modulus and unidimensional compression strength [27]). For all the other cases above mentioned (i.e., [28,29,30]), the purpose of NNs was to reproduce the stress-strain curve by anticipating new values of stress or strains obtained when some others were changed in a controlled way. The datasets were the result of a limited number of laboratory experiments (around several tens to a few hundred). Only in [26], the database included data from near 1 thousand experiments. In some cases, a single stress-strain curve measured in a laboratory experiment was used to gather the data. NNs have also been been successfully used to link parameters used in DEM simulation to the macroscopic behavior of granular materials (angle of repose [31] or hopper discharge rate [32]). These researches have shown how NNs enable for the understanding of the bulk behavior of granular materials with a reduced number of virtual experiments.
In this research the role played by the PSD in the mechanical behavior of an idealized system of polydisperse spheres has been investigated by means of massive numerical testing with the DEM and NNs. This approach may shed light on the mechanical behavior of dry coarse-grain soils. This is a very timely moment since new techniques, such as X-ray tomography, are allowing for an exhaustive characterization of the microstructure of granular packings (including PSD and fabric) [33]. There are two considerable differences with respect to the previously referenced works. On one hand, the NN is built on a dataset that was the outcome of a series of more than 90 thousand virtual experiments, performed with samples of varying PSD. The set of PSDs is the outcome of a systematic exploration of possible cases lying within two particle sizes. The probability and size increments used during a discretization of the sample space determined the number of PSDs to explore. We simulated all the cases to have a sufficiently large data sample, to find out how the accuracy of the estimations depends on the size of the training dataset and to know what the Probability Distribution Functions, PDF, of the target outputs for the NN are. On the other hand, we focus on the role played by the PSD in the macroscopic behavior. We used quite simple granular systems (made of elastic and frictional spheres) and experiments to exclusively focus on the differences in the mechanical behavior due to the PSD.
The proposed approach is ab initio as phenomenological laws are not used (except that for the contact mechanics interaction). Neither intrinsic parameters that cannot be defined on the grain scale (such as maximum or minimum dry density, etc.) nor state parameters related to packing features (void ratio, average coordination number, etc.) were introduced. The mechanical features of particles and the packing and compression protocols have always been the same and the only difference from one case to another is the PSD. Albeit the simplicity of the systems, non-linear and stress dependent stress-strain curves were observed (showing the typical behavior of loose sands), with non-obvious variations from one case to another. The data were fitted to the celebrated Duncan-Chang hyperbolic model [34], which is defined by two model parameters, namely, the tangent elastic modulus E 0 and the ultimate deviatoric stress σ ult .
Thus, the proposed NN receives as input a discrete description of the PSD of a granular material at hand, and returns as output E 0 and σ ult . As it will be illustrated below, the network is able to predict the Duncan-Chang model's parameters with a high accuracy, extremely fast, and even in the presence of noisy training data. Indeed, it proved itself to be a powerful tool for unraveling the existing correlations between PSD of granular materials and their macroscopic mechanical behavior, hidden to the naked eye.
The rest of this paper is structured as follows: Initially, the discrete element method used for the generation of virtual triaxial experiments, as well as the considered PSDs and the obtained results are described in Section 2; secondly, in Section 3, we present the basic principles of artificial neural networks, together with the design of the networks used in this work and their training process; the results obtained with the NNs are presented and discussed in Section 4, as well as a study of the amount of required data to train them and their robustness with respect to noisy data; finally, conclusions are drawn in Section 5.

Massive DEM triaxial testing
The discrete element method [35], DEM, has been proven to be a very effective tool for the study of the macroscopic mechanical behavior of granular materials under drained [7,36,37,38,39,40,41,42,43,44] and undrained [45,46,47] triaxial or biaxial compression. In this work the DEM is used to perform virtual drained triaxial tests for a large number of sphere packings with different PSDs. In what follows we describe the model used for carrying out such simulations, as well as the obtained results.

Numerical setup
We performed 92, 378 DEM simulations of triaxial compression tests on samples made of particles following varying PSDs. The different PSDs used in each case were selected according to a systematic exploration described as follows: Particle diameters ranged between D min = 0.05 m and D max = 0.15 m; this interval was divided into 10 equal size bins (D i , D i + ∆D] with ∆D = (D max − D min ) /10, D 0 = D min and i = 0, 1, . . . , 9. The central size of each bin is d i = D i + 0.5∆D. The expected percentage in mass of the particles within each size bin i is denoted as p i . We consider that p i is a discrete variable that can take values from 0.0 to 1.0 and spaced by 0.1. All possible combinations {p i } 9 i=0 satisfying 9 i=0 p i = 1.0 are considered. This procedure led to the 92, 378 cases of PSDs that were subsequently used in the triaxial tests.
Once all the PSDs were defined, a random sample of particles was generated for each of them. The mass of the particles was uniformly distributed in each bin. The considered set of PSDs includes very different kinds of granular systems: Monodisperse, well graded, gap-graded multimodal distributions, etc. A few of them, which could be more recognizable by readers, have been particularly considered for illustrative purposes. These special PSDs are labeled and shown in Fig. 1(a).
For each PSD, a sample was generated by randomly locating a loose cloud of around 20, 000 spherical particles within a cubic box ( Fig. 2(a)). We imposed periodic boundary conditions and then the cubic box was isotropically shrunk to achieve a dense packing under isotropic compression conditions σ 1 = σ 2 = σ 3 = 100 kPa, where σ 1 , σ 2 and σ 3 are the principal stresses ( Fig. 2(b)). Then the stress was kept in 2 perpendicular directions (σ 2 and σ 3 ), while the sample was shortened in the third perpendicular direction until reaching a unit strain ε 1 = 0.2 ( Fig. 2(c)). The corresponding average stress σ 1 was measured at several strain levels. The deviatoric stress-strain curve, σ d = σ 1 − σ 3 vs ε 1 , was registered and fitted to a Duncan-Chang hyperbolic model [34], which is defined by 2 model parameters, namely, the tangent elastic modulus, E 0 and the ultimate deviatoric stress (σ 1 − σ 3 ) ult = σ ult : A few examples of generated curves (corresponding to the special PSDs) can be seen in Fig. 1(b).

Numerical model
We used the DEM implemented in YADE-DEM [48] 1 . Particles behave as rigid solids that obey the laws of classical mechanics. The interaction between particles is produced through a soft contact model. In particular, we used a simple linear elastic and frictional contact law. This is a common choice in DEM simulation [35,49]. Normal forces between particles are thus computed as where F n,ij is the normal force exerted by particle j on particle i, δ ij = r ij − (R i + R j ) is the distance overlap, R i and R j are the particles' radii, r ij is their relative position vector, n ij = r ij / r ij is its associated unit vector, and k n is the normal contact stiffness. In this model, k n was related to the Young's modulus of the material, E = 1.0 GPa, as k n = 2ER i R j / (R i + R j ). A random loose cloud of around 20, 000 particles is located within a box; 2) the simulation box is reduced to achieve a packing that is in equilibrium under isotropic stress; and 3) the simulation box is reduced in one direction while the stress is maintained in 2 perpendicular directions.
If two particles that were previously in contact (i.e., δ ij < 0) are displaced in a direction ξ ij /ξ ij perpendicular to n ij , an opposite shear force appears. Shear forces are limited by the inter-particle friction: where F s,ij is the shear force exerted by particle j on particle i, ξ ij is the total tangential displacement of the contact, φ = Π/6 radians is the inter-particle friction angle and k s = 0.25k n is the shear stiffness. The density of particles ρ = 10 6 kg/m 3 (as the size of the particles and the stiffness) was scaled to reduce the collision time and therefore the critical timestep used in the explicit integration of the equations of motion. The maximum strain rate imposed during the triaxial compression was fixed according to this critical timestep and updated on the fly to speedup simulations. A numerical damping was used to dissipate the kinetic energy. Details can be found in [48].

Precision and performance
As the generated samples include a finite number of particles, the computed stress-strain curves for a single PSD may fluctuate around the expected values. Accordingly, the values of the Duncang-Chang model parameters obtained from a single DEM triaxial test, are only punctual estimations, E 0, [DEM] , σ ult, [DEM] , which are generally different from the expected values,Ē 0 andσ ult . There are several reasons for this variability: The size of the particles used in each simulation is randomly chosen according to the PSD, particles are randomly located within the simulation box and the system is chaotic. In any case, the larger the sample, the smaller the fluctuation. The expected variability of measurements was assessed through a series of virtual triaxial tests. These tests were performed with samples made of varying number of particles but that always followed the same PSD (the uniform PSD in Fig. 1). The experiment was repeated 15 times for each number of particles to gather a statistical sample of E 0 and σ ult values. A coefficient of variation was defined for each model parameter x as CV M x = s x /x (where s x is the sample standard deviation,x is the sample mean and M stands for measurement). Results are shown in Fig. 3. In order to achieve a good compromise between accuracy and computational cost, the size of samples was limited to around 20, 000 particles in the DEM experiments used to train the NN. With this number of particles the CV M of E 0 and σ ult are expected to remain around 0.05 and 0.10, respectively (see Fig. 3). These numbers indicate that the measurement of E 0 is more precise than that of σ ult .
The numerical experiments were computed using the version 2018.02b of YADE-DEM [48], running on Ubuntu 18.04.4 64 bits, on a server machine with four processors Intel Xeon Gold 6148 2.40 GHz, with 20 physical cores each, and 1 TB of RAM memory. As a rough estimation, each single DEM simulation took on average 1 hour and 20 minutes on a single core. Therefore, the total computation time for processing the 92, 378 samples was around 5, 135 days. In order to speed up the process, many computer cores were used for running multiple independent simulations in parallel. Thus, the total process time was reduced to 4 and a half months of computation, approximately.

Virtual triaxial testing results
The 92, 378 samples were virtually subjected to triaxial compression. The corresponding stress-strain curves presented the typical behavior of loose sands. A good matching between each series of data and a Duncan-Chang hyperbolic curve was achieved. The values of E 0 obtained from DEM after a flat sampling over the set of PSDs, are distributed as shown in Fig. 4(a). The sample mean isĒ 0 /E = 7.83 · 10 −4 , its standard deviation is s E0 /E = 8.59 · 10 −5 and the maximum and minimum values are E 0,max /Ē 0 = 1.82 and E 0,min /Ē 0 = 0.75, respectively. The coefficient of variation of this problem quantifies how the expected value of a specific PSD may separate from the mean value across all the PSDs. Regarding the tangent elastic modulus, the coefficient of variation is CV E0 = 0.11. With respect to the values of σ ult obtained from DEM, the distribution is shown in Fig. 4(b), the sample mean isσ ult /E = 2.81 · 10 −4 , its standard deviation is s σ ult /E = 3.72 · 10 −5 (CV σ ult = 0.13) and the maximum and minimum values are σ ult,max /σ ult = 1.71 and σ ult,min /σ ult = 0.51, respectively.
These results evidence that variations of the Duncan-Chang model parameters can be found depending on the PSD. Unfortunately, there is no sign of correlation between the two model parameters (see Fig. 4(c)). In addition, they neither correlate to the set of inspected statistical or geotechnical descriptors described in Table 1, as it can be observed in Figs. 5 and 6, respectively. For the 9 special PSDs included in Fig. 1, the values of the considered statistical and geotechnical descriptors are gathered in Table 2 and also shown in Figs. 5 and 6.
In the light of these results, to establish relationships between PSD descriptors and Duncan Chang model parameters does not seem feasible.

Artificial neural networks
Artificial neural networks, or simply Neural Networks (NN), are biologically inspired computing systems able to learn from data. Data abundance, together with increasing computing power, are probably the two main factors behind the great success of these algorithms and their exponential growth during the last decade, despite the fact that their origin dates back to the early 40s of 20th century [50]. Artificial neural networks, together with other machine learning techniques, have been proven very successful tools for tackling tasks as image recognition, language processing or financial forecasting, to name just a few. Beyond doubt, machine learning in general, and neural networks in particular, are powerful tools for untangling complex patterns on large datasets.
Motivated by the apparent lack of correlation between PSD descriptors and the Duncan Chang model parameters evidenced in the previous section, in this work we present, as an accurate alternative, the use of NNs for inferring the macroscopic mechanical behavior of polydisperse granular packings. As it will be seen in the results presented in Section 4, this tool will help us to find hidden connections between the particle size distribution of spherical packings and their   Fig. 1 are highlighted, while their actual values are gathered in Table 2 macroscopic mechanical behavior.

The multilayer perceptron
One of the most simple and commonly used NN architectures is the Multi-Layer Perceptron (MLP). The MLP can be seen as a non-linear function that maps input data to output data. It consists of several layers: One input layer, one or more (intermediate) hidden layers, and one output layer. The input information is feed-forwarded from the input layer, through all the intermediate layers, up to the output layer. Each layer is composed of one or more nodes (or  Table 1). The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2 0   Table 1). The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2 neurons) that are the basic computational units (see Fig. 7). At each layer, the neurons are fed with the output generated by the neurons of the previous layer, they process the data, filter it through a non-linear activation function, and produce new output values that feed the neurons of the next layer (if any). The presence of non-linear activation functions grants NNs the ability of approximating non-trivial functions. Indeed, as stated by the universal approximation theorem [51], feed-forward NNs with a single (finite) hidden layer and differentiable activation functions, can approximate any continuous function; and in the case of two hidden layers or more, any function [52].   Let us describe how a MLP generates output values from given input. Let L + 1 be the number of layers in a NN, such that L ∈ Z + and L > 1, and let N (l) ∈ Z + be the number or neurons of the l-th layer, with l = 0, . . . , L, where the layer 0 corresponds to the input layer and the L-th layer is the output one. We denote as x (l) ∈ R N (l) the input vector of the l-th layer and, accordingly, x (l+1) ∈ R N (l+1) is the output of that layer and the input of the next one. Thus, x (0) are the input values of the network and x (L) are the output ones. Starting from input vector x (0) , the values of layers 1 to L are computed through the recursive expression: where W (l) ∈ R N (l+1) ×N (l) and b (l) ∈ R N (l+1) are the weights matrix and bias vector, W (l) x (l) is a matrix-vector product that results in a vector of length N (l+1) and ϕ : R → R is the non-linear activation function. Among others, the Rectified Linear Unit (ReLU) function [53], defined as ϕ(z) = max(0, z), is one of the most commonly used activation functions. In the case in which z is a vector, as it is the case of Eq. (4), ϕ is applied to each vector component independently.
On the other hand, the coefficients of the weights matrix W (l) and the bias vector b (l) are a collection of trainable parameters that describe the NN. Those parameters, initially unknown, are determined by means of a process known as training. The goal of the training is to find a set of values for those parameters that leads to an accurate input-output mapping of the network for the training dataset (a subset of the available input-output samples). Finding the locally optimal parameters is a minimization process of a (loss) function that measures the distance (in a certain norm) between the known output sample values and the ones predicted by the network. The mean squared error norm, used in this work, is one of the most commonly used loss functions. Such optimization is commonly carried out by means of gradient-based iterative algorithms, like the ones of the family of stochastic gradient descendent methods, as it is the case of Adam [54].
For an in-depth discussion of MLPs and other NN architectures we refer the interested reader to [50,55].

NNs for predicting Duncan-Chang model's parameters from PSDs
In this work we used MLP networks for predicting the parameters of the Duncan-Chang's model from a discrete description of the particle size distribution of a given spherical packing. The use of MLP networks in the context of DEM simulation has allowed for the establishment of links between model features and bulk material behavior [31]. The definition, training and evaluation of the NNs was implemented using TensorFlow [56]. Thus, as it can be seen in Fig. 7, the designed network receives as input the ten PSD related values {p i } 9 i=0 , defined in Section 2.1, and returns as output E 0 and σ ult . Therefore, the input and output layers present 10 and 2 neurons, respectively.
The best network architecture (number of hidden layers and neurons) for the problem at hand is unknown a priori. Thus, in order to choose a good architecture, we systematically explored network configurations with different number of hidden layers and neurons per layer. In the results presented in Section 4, networks with 1, 2, 3 and 4 hidden layers and 8, 16, 32 or 64 neurons each (16 different architectures) were considered. For all of them, the ReLU activation function was used for all the layers, including the output one.
The available virtual triaxials dataset (92, 378 samples), was divided into three separated groups: the test dataset, composed of 72% of the total samples available (66, 152); the cross-validation dataset, 8% of the total samples (7, 390); and the test dataset, constituted by the remaining 20% samples (18,476). These three datasets were chosen randomly, nevertheless, they remain constant along the different analyses performed. Whereas the test dataset was used for training the NNs, the cross-validation dataset helped us to compare the networks' performance and verifying the absence of undesired overfitting effects during the training process. Finally, the test dataset was used for measuring the accuracy of the chosen networks when predicting a series of cases that were not used during the training process.
The training process of all the networks was carried out using Adam [54] with 1000 epochs (training iterations through the whole test dataset). And, in order to speedup the training process, the input and output data were previously normalized. Three different learning rates were considered for Adam, namely α = 10 −2 , 10 −3 , 10 −4 . The network's training is an inherently random process for two main reasons: Adam is by definition a stochastic algorithm in which the training samples are processed in a random order at each iteration; and the network's weights are randomly initialized. Thus, in order to overcome these sources of randomness, each one of the 16 network architectures was trained 5 times for each learning rate.  After this training process, the network with the lowest loss function value for the cross-validation dataset was chosen. No large differences were observed among the different architectures, nevertheless, a NN with a single hidden layer and 32 neurons on that layer presented a slightly better performance (network NN1 in Table 3).

Prediction of Duncan-Chang model parameters through neural networks
The ability of the NNs described in Section 3.2 to predict the values of E 0 and σ ult from PSD information, is discussed in this section. In order to assess the prediction ability we consider discrepancies between NN predictions and DEM measurements for the same PSD. The relative discrepancy for the model parameter x (E 0 or σ ult ) associated to a PSD is evaluated according to: where x [DEM] is the DEM measurement and x [NN] is the NN estimation. In contrast to discrepancies, errors are defined with respect to the expected value of each model parameter,x (Ē 0 orσ ult ), associated to a PSD. However the expected values are usually unknown. They would be obtained with an infinitely large sample or by averaging over many random realizations of the triaxial test with packings following the same PSD. As mentioned above, a randomly chosen test dataset of 20% of the sample cases (18, 476 out of 92, 378) was used for testing the network's accuracy. These data are new to the network, in the sense that they were used neither during the training nor the cross-validation processes. As presented below, different networks were trained using varying number of samples, nevertheless, the test dataset used for evaluating the networks' performance was kept constant along all the cases considered.

Neural network ability to predict the Duncan-Chang model parameters
Let us consider the network NN1 (see Table 3), trained using the full test dataset (the 80% of the 92, 378 cases, see Section 3.2). For this specific NN the corresponding distributions of relative discrepancies within the test dataset are shown in Fig. 8. The NN1 anticipated values with discrepancies that fluctuated  Table 3) around 0. The standard deviations were s ∆ E 0 = 0.046 and s ∆σ ult = 0.115. The accuracy with the tangent elastic modulus is higher than with the ultimate deviatoric stress. This is consistent with the fact that the precision in the measurement is higher and the observed variation accross the considered cases is lower in the case of E 0 . For the sake of comparison, if the outcomes of the NN had been the sample means or random values, then the average discrepancies would have been considerably higher. We checked this by using random estimations that followed either the observed probability distribution functions, PDFs, of E 0 and σ ult (Figs. 4(a) and 4(b)), or that followed uniform distributions lying between E 0,min and E 0,max (or between σ ult,min and σ ult,max ). This is summarized in Table 4. These results evidence the ability of the NN to predict the model parameters.
To correctly assess the accuracy of the NN, it is worth recalling that the data are pretty noisy (DEM measurements with a coefficient of variation for measurements of CV M  This fact unveils the existence of hidden correlations between the PSD and the macroscopic mechanical behavior of granular materials, that are encoded in the NN, and, in the light of the results presented Section 2.4, seemed hidden. This result is even more interesting taking into account the fact that the DEM data used for training the NN are noisy, as discussed below in Section 4.3.

4.2.
Neural network accuracy with respect to the size of the DEM training dataset Once we knew the expected accuracy of the NN predictions, we progressively reduced the size of the training datasets. Along all the presented results, the test cases were always the same 20% subset of the total. Our goal was to estimate the number of DEM tests (out of the 73, 902 possible) that are needed to effectively train the NN without significantly compromising its accuracy. To assess the accuracy of a NN trained with a certain subset of the training dataset, we evaluated the network for the test dataset and computed the mean squared deviations, M SD r [x] , where x refers to the model parameter (either E 0 or σ ult ). The subscript r denotes the percentage of the potential training cases (out of the 73, 902 possible) that were used in each training set. E.g., r = 10% means that only 7, 390 samples of the test dataset were used to train the network. M SD r[x] is defined as: where N r is the number of cases used in the training set, i.e.: N r = floor (73, 902 × r/100) . Figure 9 shows how the performance of the NN is barely affected by the size of the training dataset, even when this is drastically reduced. The networks used in Fig. 9 are defined in Table 3. With only 1% of the potential cases (around 700 DEM experiments), the network NN4 was able to predict the Duncang-Chang model parameters for the test dataset (18, 476 samples) with almost the same accuracy as NN1. Thus, we conclude that it is possible to train a NN that  Table 3) accurately predicts the Duncan-Chang parameters from PSDs by just using a dataset with less than one thousand DEM simulations. It is also important to remark that, to predict the model parameters for a new PSD would take more than 1 hour of computing time, using a DEM model analogous to the ones used in this work, whereas, using an already trained NN the time is in the order of the microseconds.
As it can be seen in Fig. 9, the networks' accuracy dropped for networks that were trained with less than 1% of the potential training samples. For these small datasets, the accuracy of the NN prediction tended to the value obtained when the sample means of E 0 and σ ult are used as estimations.

Neural network robustness with respect to noisy DEM training data
As it can be observed in Fig. 8, despite the good agreement between NN and DEM predictions for most of the test cases, the discrepancies were relatively high in some of them. Using network NN1 (see Table 3), the maximum absolute discrepancies were ∆ E0,max = 0.227 and ∆ σ ult ,max = 0.412. Nevertheless, a high discrepancy just means that DEM and NN estimations do not agree, but does not necessarily imply that the NN prediction is wrong.
In order to determine whether these discrepancies were due to inability of the NN to predict the DEM estimation or to unlikely estimations of the model parameters from the DEM, we repeated the virtual triaxial testing in the cases with highest discrepancies. We considered the networks that were trained with 1%, 5%, 10% and 100% of potential training cases (networks NN1, NN2, NN3 and NN4 in Table 3). We identified the 100 predictions with the highest deviation in E 0 and σ ult , for the networks NN2, NN3 and NN4, and the worst 1, 000 deviations for the network NN1. Many of them overlapped, so we finally selected around 1, 450 experiments to repeat. It is worth emphasizing that we did not repeat some of the cases to train the NN again in order to achieve a better matching with different data, the NNs remained unchanged.
After the repetition of these simulations, the relative discrepancies were considerably reduced in most of these cases (see Figs. 9 and 10). The standard deviation of the discrepancies over the whole test dataset were also reduced: s ∆ E 0 went from 0.046 to 0.037 and s ∆σ ult went from 0.115 to 0.090. This proves that for the repeated cases the first DEM measurement was very unlikely, whereas the NN prediction was much more accurate. (b) Relative discrepancy σ ult Figure 10: Discrepancies between NN predictions and first and second DEM measurements in the cases that had given the highest discrepancies when comparing the first DEM measurement to the NN predictions. Results obtained with the network NN1 (see Table 3) This result does not come as a surprise: As already highlighted in some recent works (see, e.g., [57]), NNs are robust to a certain extent respect to mislabeled or noisy training data. In the context of this work, this can be understood based on the fact that the NN was trained using datasets that contain many test cases corresponding to PSDs that are very close to those being troublesome. Therefore the NN downplays the contribution of outliers. Thus, as it was done in this work, the trained NN can also be used as a tool for identifying unlikely estimations of the DEM.
In order to further support this claim, one of the cases with the highest discrepancy between the NN estimation and the DEM measurement (PSD case 59861, see Fig. 11(a)), was more thoroughly analyzed. This PSD was used to randomly generate 1000 new packings to be subjected to DEM triaxial compression. The stress-strain curves were fitted to Duncan-Chang model, generating statistical samples of E 0 and σ ult values for this single PSD. With such large samples we could estimate the expected values of the model parameters for PSD 59861 from the samples means. Focusing on the tangent stiffness E 0 , the obtained sample mean wasĒ 0 /E = 7.461 · 10 −4 , its standard deviation was s E0 /E = 2.791 · 10 −5 (CoV E0 = 0.037) and the minimum and maximum values were E 0,min /Ē 0 = 0.895 and E 0,max /Ē 0 = 1.262, respectively. Figure 11(b) shows the histogram of E 0 for these 1000 triaxial tests. The value estimated in the first DEM test was E 0,[DEM] /E = 9.413 · 10 −4 . Therefore, the first DEM measurement provided very unlikely model parameters (|E 0,[DEM 0 ] −Ē 0 | = 6.992 s E0 ) and this is the reason why the discrepancy with NN estimation was so large. When the experiment was repeated for a second time, the new DEM estimation was E 0,[DEM 1 ] = 7.790 · 10 −4 , which is considerable closer to the expected value (|E 0,[DEM 1 ] −Ē 0 | = 1.175 s E0 ). In contrast, the NN (which was trained from noisy data) predicted a tangent stiffness value of E 0,[NN] /E = 7.456 · 10 −4 , which is really close to the expected value (|E 0,[NN] −Ē 0 | = 0.019 s E0 ).

Conclusions
We selected 92, 378 Particle Size Distributions, PSD, lying within two particle sizes. We performed virtual triaxial tests with the DEM on samples that followed these PSDs. We fitted the resulting stress-strain curves to Duncan-Chang hyperbolic models, gathering a statistical sample of the two model parameters, namely, E 0 and σ ult . We found variations of these parameters across the statistical sample that are not easily associated to the PSD. The parameters followed bell-shaped distributions. In the case of E 0 , CV E0 = s E0 /Ē 0 = 0.11 and E 0,max /E 0,min = 2.4. In the case of σ ult , CV σ ult = s σ ult /σ ult = 0.13 and σ ult,max /σ ult,min = 3.4. Because of the finite number of particles used in each experiment (20, 000), the parameters obtained from a single DEM simulation may fluctuate to some extent from the expected values (with coefficients of variation for measurements of CV M E0 = s E0 /Ē 0 0.05 and CV M σ ult = s σ ult /σ ult 0.10). We compared DEM measurements to common statistical and geotechnical descriptors derived from the PSD but did not find any correlation. More precisely, we used the coefficient of uniformity, the coefficient of curvature, the mean size, the standard deviation, the skewness and the excess kurtosis. In contrast, by using a Neural Network, NN, trained with a dataset generated through DEM simulations, we were able to predict the expected model parameters for each experiment with high accuracy. The input for this NN was directly the PSD and the output was the model parameters. We tried several NN architectures. 20% of the dataset was used to test the ability of networks to anticipate the model parameters. The size of the training dataset varied between 0.1% and 100% of the remaining DEM experiments.
We observed that the maximum accuracy is similar to the precision of measurement. This precision is achieved with a training dataset of 1% of the potential training cases (about 700 DEM simulations). This means that using NNs trained with less than one thousand triaxial experiments it is possible to predict accurately the macroscopic mechanical behavior of granular materials by just using their PSD.
We also observed that the largest discrepancies between NN predictions and DEM measurements occurred precisely when the DEM experiments led to unlikely values in the first simulation. Therefore the NN was also useful to identify unlikely DEM results. The key to achieve more accurate estimations seems to be the reduction of the data noise.
The PSD often affects the mechanical behavior of granular materials. There must exist relationships linking the mechanical behavior to the PSD that are hidden to the naked eye. Nor even using statistical or geotechnical descriptors that may quantify the PSD to some extent, relationships could be established. In contrast, neural networks were capable of finding those relationships. This research opens a way to address other problems (e.g., different stress-strain paths or sample preparation procedures), with the objective of better understanding the relationship between PSDs and the macroscopic behavior. The great advantage of the combination of DEM with NN is that we can know much more by simulating much less.