Application of Artificial Neural Networks for Predicting the Bearing Capacity of Shallow Foundations on Rock Masses

Calculation of the bearing capacity of shallow foundations on rock masses is usually addressed either using empirical equations, analytical solutions, or numerical models. While the empirical laws are limited to the particular conditions and local geology of the data and the application of analytical solutions is complex and limited by its simplified assumptions, numerical models offer a reliable solution for the task but require more computational effort. This research presents an artificial neural network (ANN) solution to predict the bearing capacity due to general shear failure more simply and straightforwardly, obtained from FLAC numerical calculations based on the Hoek and Brown criterion, reproducing more realistic configurations than those offered by empirical or analytical solutions. The inputs included in the proposed ANN are rock type, uniaxial compressive strength, geological strength index, foundation width, dilatancy, bidimensional or axisymmetric problem, the roughness of the foundation-rock contact, and consideration or not of the self-weight of the rock mass. The predictions from the ANN model are in very good agreement with the numerical results, proving that it can be successfully employed to provide a very accurate assessment of the bearing capacity in a simpler and more accessible way than the existing methods.


Introduction
Shallow foundations are the type of foundation more commonly used to transmit the applied loads to the underlying soil or rock from civil engineering or building structures. According to the properties of the rock mass and the layer beneath, the failure of rocks under applied loads may occur through several mechanisms (Sowers 1979;Paikowsky et al. 2010;Canadian Geotechnical Society 2006). These mechanisms are presented in Fig. 1 and correspond to (a) general shear failure; (b) local failure when discontinuity spacing is bigger than the foundation width; (c) failure of the underlying rock columns created by discontinuities spacing smaller than the foundation width; (d) punching shear failure following vertical joints existing bellow the contact zone or produced by shallow cavities.
These failure modes reveal the range of application of the different analytical or numerical formulations and the validity of empirical methods based on field trials. The local conditions are more representative of cases (b), (c), and (d), which acquire a great dependence on the spacing and the width of the foundation so that their adequate bearing capacity can be deduced from simplified mathematical models, field trials, and reduced models, where such local conditions can more easily be presented. However, the foundations of large civil engineering works, which include rock masses with varied distribution of discontinuities (type I, IV or V, Fig. 2), respond in a general way to a type of global failure, defined by the case (a). In this case, empirical experiences are more unpredictable due to their local condition and it is necessary to resort to analytical and/or numerical techniques.
The limitations under which analytical solutions can be established and the high computational costs to guarantee the convergence and stability of the solution in numerical methods (both are analyzed in detail in the following sections) makes advisable to offer simple formulations that allow obtaining a reliable solution of rapid application for the calculation of the bearing capacity under global failure in a rock mass.
Considering the intrinsic difficulty of the assessment of many of the engineering problems related to technical design, soft computing appeared as an alternative to the common analytic and numeric approaches. A type of soft computing technique is ANN. It learns from a dense enough data set and configures a black-box-type prediction model that solves the problem in the form of a closed simple equation.
Many applications of ANN were presented in recent years, particularly in the Geotechnical Engineering field, as discussed by Shahin et al. (2001Shahin et al. ( , 2008 in the review papers on the subject, expanding to almost every problem in the field. Regarding shallow foundations, the specific subject of this research, many applications of neural networks were presented. The problem of shallow foundation settlement on cohesionless soils was addressed first, considering for example Sivakugan et al. (1998), Chen et al. (2006) and Shahin et al. (2002aShahin et al. ( , 2002bShahin et al. ( , 2003aShahin et al. ( , 2003bShahin et al. ( , 2004aShahin et al. ( , 2005aShahin et al. ( , 2005b. Later, some other researchers used perceptrons to obtain alternative approaches to the problem of shallow foundations, predicting the bearing capacity of strip footing on multi-layered cohesive soil (Kuo et al. 2009), or on cohesionless soils using neuro-fuzzy models (Provenzano et al. 2004;Padmini et al. 2008).
In the rock mechanics field, some advances were presented recently. Some of them are related to the analysis by ANN's of the parameters defining the rock behavior. Some researchers as Yang and Zhang (1997) used neural networks to identify the relative effect of each factor involved in a rock mechanic problem, as the stability of underground openings. Mert et al. (2011) and Gholami et al. (2013) presented an approach to assess the total RMR classification system using a simulation-based on neural networks. Ocak and Seker (2012) developed a neural network to estimate the elastic modulus of intact rocks, since its difficult determination in laboratory tests because high-quality cores are required. Yılmaz and Yuksek (2008) used ANN to indirectly estimate the rock parameters.
Very few results were presented related to the bearing capacity of shallow foundation on rocks, as Ziaee et al. (2015), based in a comprehensive database of tests and reduced model results, using four main parameters: rock mass rating, unconfined compressive strength of rock, ratio of joint spacing to foundation width, and angle of internal friction for the rock mass. Alavi and Sadrossadat (2016) proposed precise predictive equations derived from linear genetic programming (LGP) for the ultimate bearing capacity of shallow foundation resting on a jointed (non-fractured) rock. A comprehensive and reliable set of data including 102 previously published rock sockets, centrifuge rock sockets, plate load, and large-scaled footing load test results is collected to develop the models. These works, based on the results of load tests and small-scale tests, allow considering the incidence of local conditions to develop local failure mechanisms. However, they do not focus attention on the general shear failures generated in sound or fractured rock masses, which can occur in largescale civil works, where furthermore, for an adequate analysis of the bearing capacity, factors that are difficult to reproduce in tests on a small scale such as dilatation or roughness of the foundation contact must be incorporated. Then, more research is needed and this paper follows this line of exploration, introducing an enlarged set of parameters and adopting the widely accepted Hoek and Brown failure criterion for the rock.
As proved by the previous references, using ANN appears as an adequate alternate approach to the problem, overcoming the limitations of empirical, analytical, and numerical methods, provided an extended and accurate set of data is used to build the network.
A neural network (ANN) is proposed in this research to offer this alternative approach, introducing an enlarged set of parameters and adopting the widely accepted Hoek and Brown failure criterion for the rock. The network learns from a dense enough data set and configures a black-box-type prediction model that solves the problem in the form of a set of closed simple equations.

Objectives
The main objective of this research is to develop a neural network to address the calculation of the bearing capacity of shallow foundations on rock masses from a different approach than in common practice, where empirical, analytical, and numerical methods are used.
The reason for using this alternative approach is to make available a method of calculation that is simultaneously easy to use without having the limitations of other simple methods as empirical or analytical methods. That is also to say, having the accuracy of numerical methods without their complexity of use.
To reach this goal, it is necessary to consider a detailed analysis of the rock failure mechanism and of the analytical and numerical methods used to solve the problem of the bearing capacity, considering their respective limitations.
The simplifications adopted in the analytical solution (Serrano et al. 2000) (plane strain, the associative flow rule, the coaxiality, the perfectly plastic yield surface, and weightless rock mass) and the relative complexity of numerical models, that considerably raises computational costs, invite to consider new approaches. These approaches should relate the bearing capacity of shallow foundations on rock masses with an extended set of parameters, including those related to the rock itself and those related to the problem geometry and characteristics.
The neural network is built based on an extended set of numerical calculations obtained using the commercial software FLAC, including those most influential parameters in the bearing capacity problem.
From this dataset, the network is trained and optimized, offering a fair rate between simplicity and accuracy. Finally, this generated ANN is converted into a set of simple equations of easy implementation and use.

Identification of the Failure Mechanism Under Study
The heterogeneous and anisotropic nature that can characterize rock masses locally is determined by the presence of discontinuity families so that the study of the bearing capacity is highly conditioned by the distribution of discontinuities in the rock mass. Simplified and semi-empirical solutions are available for specific rock mass configurations (Bishoni 1968;Goodman 1989) indicating that the bearing capacity of shallow foundations on jointed rock masses depends on the ratio of space between joints to foundation width, joint condition, rock type, and the condition of the underlying rock mass. Among this type of formulations, some stand out in regulations such as the Eurocode 7 (2004), which includes a rather simplified method, in which, as a function of the rock type, the ultimate bearing capacity is estimated depending on the uniaxial compressive strength and the spacing of the main joint set. However, these formulations do not take into account the important role of the rock type and its qualitative mass parameters such as rock quality designation (RQD), rock mass rating (RMR), or geological strength index (GSI) and therefore, they do not result useful for identifying general failure mechanisms, which cover a large amount of ground.
There were also proposed empirical methods that often establish a correlation between the bearing capacity and rock mass properties based on the observations and experimental test results such as (Bowles 1996) or the method by Carter and Kulhawy (1988), recommended by AASHTO (2007) and based on the lower bound solution. These equations consider the particular and local conditions of the experimentation but cannot take into account the geometry of the foundations or the spacing between joints in a general way.
The singularities induced by the distribution of discontinuities can be treated globally when the scale of the problem allows us to consider any of the situations indicated in Fig. 2, where the rock mass can be considered as a homogeneous and isotropic medium (cases I, IV, and V). This is the case in many of the main engineering works where large foundation surfaces are required, such as bridges, dams, or large building towers. This analysis corresponds to the study of the global shear failure of the ground and its analysis must be based on the choice of a suitable failure criterion for the rock mass.
Defining the value of the dilatancy angle is a very complex problem, being usual practice to define either null dilatancy or the associative flow rule. Considering the correlations proposed by Hoek and Brown (1997), the dilatancy is between 0º and a quarter of the friction angle, depending on the geotechnical quality of the rock mass. From this perspective, it is considered reasonable to study the extreme cases of the problem, from the conservative side (null dilatancy) to the more optimistic side (associated dilatancy).

Hoek and Brown Failure Criterion
In rock mechanic, the non-linear Hoek and Brown failure criterion (Hoek andBrown 1980, 1997;Hoek et al. 2002) is the most used and it is applicable for the rock mass with a homogeneous and isotropic behavior, that is to say, it has the same physical properties in all directions because of the inexistence or the abundance of discontinuities. The Hoek and Brown failure criterion depends on the major principal stress 1 and minor principal stress 3 according to the following equation: The uniaxial compressive strength (UCS) is c , while the parameters m, s, and a can be evaluated by (2), (3) and (4) and depend on the rock type (m 0 ), geotechnical quality of the rock mass (GSI) and damage in the rock mass due to human actions (D) that in shallow foundations is usually equal to zero: This criterion defines the failure envelope for a rock mass and allows the study of the mechanism of general shear failure in the ground for the problem of bearing capacity of shallow foundations.

Calculation Methods of the Global Shear Failure Mechanism
Over the years several methods were used to study the bearing capacity under the global failure of the ground: limit equilibrium method (Terzaghi 1943;Meyerhof 1951), slip line method (Sokolovskii 1965), limit analysis method (Sloan 1988;Sloan and Kleeman 1995), and the numerical method (Griffiths 1982;Merifield et al. 2006). The traditional analytical solutions to estimate the ultimate bearing capacity in soils (Terzaghi 1943;Brinch-Hansen 1970) were developed for the linear Mohr-Coulomb failure criterion that depends on the cohesion and internal friction angle of the material. Since the development of the non-linear Hoek and Brown failure criterion for rock masses (Hoek andBrown 1980, 1997;Hoek et al. 2002), the equivalent strength parameters of the Mohr-Coulomb failure criterion (cohesion and friction angle) were assessed for the corresponding stress level of the rock mass, to introduce them in traditional formulations. However, some research from different authors as Merifield et al. (2006) observed that the use of Mohr-Coulomb equivalent strength parameters overestimates the bearing capacity by up to 157% in the case of a good quality rock mass (GSI about 75).
To calculate the bearing capacity, Yang and Yin (2005) applied the multi-wedge translation failure mechanism and the tangential line technique. They used the upper bound (2) m = m 0 ⋅ e GSI−100 28−14⋅D , limit theory for strip foundation based on a modified Hoek and Brown failure criterion deducing the equivalent parameters of the Mohr-Coulomb failure criterion. Saada et al. (2008) also proposed another method to calculate the bearing capacity based on the limit theories by applying the Hoek and Brown failure criterion and deducing the equivalent parameters of the Mohr-Coulomb method that provided a better fit than the results obtained in Yang and Yin (2005). However, these methods do not claim to be general for consideration in the presentation of results and their implementation requires a specific analysis. Thus, a specific study is necessary that expressly considers the markedly non-linear character of the rock mass, for which analytical and numerical methods have been developed.

Analytical Method
The analytical method that solves the internal equilibrium equations combined with the failure criterion was proposed by Serrano and Olalla (1994) and Serrano et al. (2000) applying the Hoek and Brown failure criterion (Hoek and Brown 1997) and the modified Hoek and Brown failure criterion (Hoek et al. 2002) respectively. It is based on the characteristic line method (Sokolovskii 1965) with the hypothesis of plane strain, associative flow rule, coaxiality, perfectly plastic yield surface, and weightless rock mass.
According to this analytical formulation, the ground surface that supports the foundation is composed of two sectors ( Fig. 3): boundary 1 with inclination α where the load acting on a surface is known (for example, the ground load on the foundation level or the load from installed anchors) (acting with the inclination i 1 ); and the boundary 2, where the bearing capacity of the foundation should be determined (acting with the inclination i 2 ).
The failure criterion of Hoek and Brown (1) is nonlinear, then the inclination of the curve on the diagram shear stress (τ)-normal stress (σ) (which in rock mechanics is associated with the instantaneous friction angle of the material) depends on the stress state.
The analytical solution is based on the characteristic line method with the equation of the Riemann invariants fulfilled along the characteristic line (Serrano et al. 2000). From the overload on the surface, it can be obtained the instantaneous friction angle, and the direction of the principal stress in this boundary 1 (see Fig. 3), and using Riemann invariants, both the instantaneous friction angle below the foundation (boundary 2, see Fig. 3) and the ultimate bearing capacity can be estimated.

Numerical Methods
Finite element limit analysis has been one of the possibilities to obtain collapse loads. Different calculations using the finite element method under lower and upper bound theorem (Sloan 1988;Sloan and Kleeman 1995) were used by Zheng et al. (2000) and Sutcliffe et al. (2004) to determine the bearing capacity of the fractured rock and jointed rock mass. Later, Merifield et al. (2006) applied the limit theorems (upper and lower bound), as an extension of the formulation developed by Lyamin and Sloan (2002a, b) to determine the bearing capacity on a fractured rock mass whose behavior is a Hoek and Brown type. More recently, a lower-bound finite elements limit analysis in combination with either semidefinite programming (SDP) or nonlinear optimization respectively (Kumar and Khatri 2011;Kumar and Mohapatra 2017;Chakraborty and Kumar 2015) have been applied to solve stability problems involving a modified Hoek and Brown (Hoek et al. 2002) yield criterion in the rock mass, in particular the bearing capacity for both circular and strip footings.
Finite element limit analysis computes the upper o lower limit load using optimization techniques rather than time stepping and increasing the system load to a collapse load, as conventional non-linear finite element techniques do. They may use linear programming to consider the common nonlinear behavior of yield surfaces or second-order cone programming to address them directly (Makrodimopoulos and Martin 2006). However, as reported in Smith and Gilbert (2007), the solutions obtained are often highly sensitive to the geometry of the original finite element mesh, though the problem may be overcome by adaptive remeshing techniques at the cost of a higher complex procedure.
Similar limitations were found using other numerical techniques as the Finite Difference method. For example, some authors reported the difficulty of obtaining stable results in conditions of highly fractured rock masses with high values of m 0 and low values of GSI (Merifield et al. 2006). Contrary to the case of the linear Mohr-Coulomb criterion used in soil mechanics, the application of the Hoek and Brown non-linear criterion has produced results discrepancies among researchers when evaluating the bearing capacity of a foundation (Alencar et al. 2019) due to the need for an exhaustive numerical study.

Generation of the Bearing Capacity Results Database
To apply the ANN technique, it is necessary to have a database with reliable results, which is extensive enough to cover the range of variation of the parameters identified as essential for defining the global bearing capacity of the rock mass and that allows training and properly validating the developed neural network. The first step is to identify the parameters that influence the evaluation of the bearing capacity of a foundation on a rock mass. Thus, the geomechanical, geometric, and soilstructure interaction parameters indicated in Table 1 are considered.
Next, the calculation cases have been established by varying the parameters throughout their range of variation to have the set of data used to build and calibrate the ANN model. Table 2 shows the values adopted for the parameters so that a series of 2762 calculation simulations is obtained.
Finally, it is necessary to be able to generate reliable results for each simulation case of the bearing capacity of shallow foundations on rock masses. For this, the Roughless/rough Self-weight Weigthless/self-weigthted geotechnical commercial programs FLAC (Itasca Consulting Group 2012) has been used. FLAC applies the finite differences method (FDM) to solve geotechnical problems. The software contains different constitutive models like Mohr-Coulomb and Hoek and Brown, using the tangent linear approximation in this last failure criterion. In FDM it is assumed that the bearing capacity is reached when the continuous medium does not support more load because an internal failure mechanism was formed. However, an accurate calculation requires an extensive and laborious convergence analysis of the different numerical parameters, which becomes costlier in computational terms due to the markedly non-linear character of the Hoek and Brown failure criterion that governs rock masses. To guarantee the precision of the solution, the following have been numerically controlled for each of the calculation simulations: (1) the linearization of the Hoek and Brown criterion of the rock mass; (2) the nodal distance; and (3) the velocity condition applied to the boundary nodes.
In the case of the linearization of the failure criterion, it should be noted that an adaptive discretization has been adopted for each area of the numerical model so that, in each calculation step, the stress in the area is defined as the point of failure on the Hoek and Brown criterion (that is, the failure line that is tangent to that point on the failure surface). This way of associating the linearization of the failure criterion is extended to all the areas of the numerical model and is carried out until the stresses of all the areas stabilize.
In the case under study, the vertical load was obtained through a constant velocity condition (low enough to reproduce quasi-static load conditions) applied to the boundary nodes, and the bearing capacity was determined from the relation between stresses and displacements of one of the nodes; in this case, the central node of the foundation was considered.
The numerical control of the velocity of the nodes and the nodal distance implies carrying out several similar calculations with different values of each numerical parameter until the solution is stabilized. Thus, Fig. 4 shows a 2D finitedifference model used to calculate the different cases, applying a plane strain condition to a symmetric model, where only half of the strip footing is represented; while in Fig. 5 the convergence control is presented with both numerical parameters in one of the numerical simulations carried out with FLAC, which corresponds to the values: m 0 = 12; UCS = 50 MPa; GSI = 50; B = 4.5 m; associated dilatancy; plane strain; rough contact and weightless rock.

Overview of Back-Propagation Neural Networks
A comprehensive description and analysis of the type and use of back-propagation perceptrons is above the scope of this paper and can be found somewhere else (for example Zurada 1992;Fausett 1994). A multilayer perceptron (MLP) consists of multiple layers of computational units (nodes or neurons) interconnected, in this case, in a feed-forward way. The first layer is formed by the input neurons, connected to one or more hidden layers of neurons, and followed by the final layer of output neurons. The input from each node in the previous layer x i is multiplied by an adjustable connection weight w ji when arriving at the node j in the actual layer, and a threshold value or bias j is added or subtracted. This combined input I j is then passed through a non-linear transfer function (sigmoidal or tanh function), and the result becomes the input for the next layer. This process is summarized in Figs. 6 and 7, and Eqs. 5 and 6: This process starts from the input layer, fed by a set of data presented to the network for the learning process. The final output obtained from this data is compared against the known results, then a measure of some predefined  error-function is obtained. Using different optimization techniques, the error is fed back through the network, and then the algorithm adjusts the weights of each connection to reduce the value of the error function by some small amount. Repeating this process, a large enough number of training cycles, the network eventually converges to a state having a small error, meaning that the network has learned the function.
An excessive number of training cycles may produce what is called overfitting, giving smaller errors for the training set but reducing the net capacity to predict and generalize the results from other sets. A stopping criterion is needed to avoid overfitting, being the cross-validation technique the most accepted in the field, and will be introduced in the next section.

ANN Model Development
In this research, the artificial neural network for predicting the bearing capacity of shallow foundations on rock masses is developed using the Matlab computer software utilities (Mathworks Inc. 2019a). The data used to build and calibrate the model is obtained from a series of 2762 numerical simulations of the problem using the FLAC software based on finite differences, changing the different parameters affecting the model behavior.

Selection of the Inputs and Outputs
In this particular case, as in most cases in Geotechnical Engineering, the parameter selection for the neural network is based on the knowledge of the physical problem underlying the behavior of the system.
The main parameters defining the bearing capacity of a shallow foundation on a rock mass are: three parameters characterizing the rock mass as rock type (m 0 ), uniaxial compressive strength (UCS), and geological strength index (GSI), and the foundation width (B). However, as introduced before in Table 2, there are also some other additional parameters influencing the results that allow expanding the limitations of analytical and empirical formulations and are also considered as inputs: the dilatancy on the failure surface (assigning a 0 value when the dilatancy angle is ψ = 0 and 1 for ψ = ρ), bidimensional or axisymmetric problem (assigning a 0 and 1 values respectively), rough or roughless foundation-rock contact (assigning a 0 and 1 values respectively), and weightless or self-weighted rock mass (assigning a 0 and 1 values respectively).
Therefore, the total number of inputs is 8 and the bearing capacity of the foundation is the single output variable.

Data Division and Statistical Analysis
When creating a neural network, one of the basic issues is training the net up to a point before losing its capacity for accurate predictions with a different set of inputs. This situation is known as over-fitting and can be avoided if training and checking of prediction capabilities are developed simultaneously using different subsets of the data input.
The total number of available inputs is divided into three sets, training, testing, and validation. Using this division, cross-validation or the net is allowed, as proposed by Stone (1974). The training set is used to adjust the weight of the net nodes, and the testing set is used to check the performance of the model at different stages of the training process and determine when to stop to avoid over-fitting. The validation set allows an independent check of the network when the training process is finished and the different weights of each connection are already defined. The percentage of the inputs assigned to each set is (75-15-15), meaning that 75% is assigned to the training set (1658 cases), 15% to the testing set (332 cases), and 15% to the validation set (332 cases).
The above-mentioned division should represent statistically the whole population of inputs (following Masters 1993). Then, the statistical properties of the different subsets (e.g. mean, standard deviation, and range) should match those of the complete set. Consequently, all patterns contained in the original set of inputs should be reproduced in each of the subsets. Since the division is performed randomly, an iterative process is adopted until a group of training, test, and validation sets consistent with the statistical properties is achieved.
The statistical properties considered are mean, standard deviation, minimum, maximum, and range, as suggested by Shahin et al. (2004b). Table 3 shows the results obtained for the different input parameters. It is clear that, as the input population is large, there is no problem in the statistical matching among the different subsets.
The validity of the network is limited by the range of parameters included in the training data. Consequently, the Fig. 7 Input-processing-output system in an artificial neuron performance is better when no extrapolation is made over that range.

Data Scaling
In most ANN's, the input parameters are scaled to a (0, 1) or (− 1, 1) range before the training to eliminate their dimension and to assure all of them are treated homogeneously during training the network, working well when the data distribution is more or less homogeneous. However, that scaling procedure, used alone, does not produce good results in this research.
One of the more important problems found in building the network depends on the distribution of the output values of bearing capacity (P h ) among the data range, which is extremely concentrated around the small values of P h (Fig. 8a). Since most of the P h values are very small compared to the whole range, and even the global error of the ANN approximation was very reduced, it created very important percentage errors for the smaller values when using only regular scaling.
Adding a logarithmic scaling (natural log) to the P h targets before the regular scaling (0, 1) solves the problem since it produces a more homogeneous distribution of the data along with the P h range. Data distributions without scaling, using logarithmic scaling and the final step, with log scaling and (0, 1) scaling simultaneously applied, are shown in Fig. 8b, c, respectively. It is important to note that the log scaling is exclusively applied to the output values P h , and not to any other input value.

Determination of Network Architecture and Internal Parameters
As introduced in Sect. 5, the neural network is a multilayer set of neurons (input layer-hidden layers-output layer), interconnected by inputs from the previous layer and output to the next layer. At each neuron, the inputs are changed into outputs by affecting them with weights and biases and applying a transfer function that changes their values. Building the net means defining the model architecture (layers and neurons) and training it (weights and biases). Although the input layer and output layer are predefined by the problem under study and the intended results, the hidden layers are not. Model architecture requires the selection of the optimal number of hidden layers in the network and the determination of the optimal number of neurons in each layer. Defining a net includes a process of defining many networks of different complexity and select the optimum.
There is no unified theory to obtain the optimal number of layers in a network and that task is performed by trialand-error procedures. Besides, a single hidden layer can reproduce successfully any continuous function, as stated by Hornik et al. (1989). The number of hidden nodes (2I + 1) (being I the number of inputs) is used with success in the literature of ANN for geotechnical problems.
The single-hidden layer network is tried using several nodes starting from 3 to a maximum of 25, checking that over (2I + 1) = 17 (being I = 8 the number of inputs), no improvement is achieved, as suggested by Caudill (1988). The transfer functions used at each layer to change from input to output (see Sect. 5) can be also chosen among a variety of them. In the single hidden layer model, different combinations of transfer functions were tried (sigmoidal, hyperbolic tangent, and linear for the output layer) to identify those giving the most accurate predictions. Finally, a sigmoidal transfer function is adopted for the hidden and a linear one for the output layers.
The networks performance, concerning the number of hidden layer nodes, is evaluated using as performance measures the correlation coefficient r, the root mean square percentage error, RMSPE, and mean absolute percentage error, MAPE. These indicators are chosen because their measurements of the errors are relative and expressed as a percentage, allowing their immediate and straightforward evaluation. The expression for each one is included in Table 4. The results for the different indexes are shown in Table 5.
The RMSPE and MAPE are also plotted versus the number of hidden layer nodes in Fig. 9 and Fig. 10, respectively. As can be seen in the figure, performance quickly improves when increasing the number of nodes, highly improving before 5 nodes, and stabilizing over 13 nodes. A network with seven hidden layers' nodes is selected as the optimum model, balancing that the performance measures are in all cases around or below 5% and that the number of nodes is not very high. It is the authors' opinion that reducing the performance to 2% does not compensate for increasing the number of hidden nodes to 13 or 14.  The structure of the optimum neural network is included in Fig. 11 showing an input layer with 8 neurons, a hidden layer of 7 neurons, and an output layer of 1 neuron.
The weights and biases of the network are detailed in Sect. 5.7.
The comparison between targets and predictions is superimposed in Fig. 12a, where a very good agreement is observed. The histogram of error values obtained predicting P h with a 7-hidden-nodes ANN is also shown in Fig. 12b. The error is below 10% in most cases, never reaching a maximum of 20%. The plot of predicted versus calculated P h bearing capacity with respect to the data in the training, testing, and validation sets, and all the sets simultaneously, are shown in Fig. 13a-d, respectively, where the solid line indicates equality and the regression coefficient is indicated above each sub-figure. According to the value of R, always higher than 0.999, it can be concluded that there exists a very good correlation between the model predictions and the target calculated values.

Model Optimization
Optimizing the network is addressed by the process of training or learning that can begin once the network weights and biases are initialized. It requires a set of examples of the correct network behavior (network inputs and target outputs to compare to in successive training steps).
The training process consists of tuning the values of the weights and biases of the network to optimize network performance which is checked using the performance function. Training in feed-forward neural networks commonly uses the back-propagation algorithm (Rumelhart et al. 1986), which involves performing computations backward through the network. Its details can be found in every basic reference on neural networks (Fausett 1994).
The performance function used in this procedure is the mean square error (the average squared error between the series of network outputs x i and the series of target outputs , though there are some other options. At each step in training, the gradient of the network performance with respect to the network weights is calculated. The training process will stop when one of the stopping criteria is reached, as explained in Sect. 5.6. Different training algorithms may be applied to the network. It is difficult to know in advance the fastest method for a given problem, then trial and error seems to be the appropriate decision criterion. Among the algorithms, Quasi-Newton, resilient backpropagation, scaled conjugate gradient, and the Levenberg-Marquardt optimization schemes, offered as Matlab algorithms, were tried in this research. The Levenberg-Marquardt algorithm (Marquardt 1963) was selected since it produced faster and more accurate predictions in this problem. As a drawback, it requires more memory than other algorithms but that was not a limitation in this particular model.

Stopping Criteria
Different stopping criteria are used to decide when finishing the training of the network (Shahin et al. 2008). Using the Matlab built-in Levenberg-Marquardt procedure, the training stops when any of these conditions occur (MathWorks 2019b): -The maximum number of steps is reached. -The maximum amount of time is exceeded. -The performance function is minimized to the goal. -The performance gradient falls below a pre-fixed minimum. -The momentum exceeds a pre-fixed maximum. In this algorithm, the momentum does not have a constant value but is decreased after each successful step from an initial value (0.01), aiming to a final value of 0 (in this case, the value of 1e−8 was reached). Using this procedure, the performance function is always reduced. -The validation performance has increased more than a pre-fixed number of times since the last time it decreased (when using validation).
In this research, the input data is divided into training, testing, and validation subsets and cross-validation criteria (Stone 1974) can be used. This is the most valuable criteria to avoid over-fitting (Smith 1996). From the training and testing subsets, the performance of the network is checked at each step and used to adjust the connection weights. The network is then validated using the independent validation subset, examining if the performance improves as expected or, on the contrary, deteriorates (which means over-fitting), in which case the training stops. To avoid false minima, the training continues several steps before concluding the process.
The performance evolution (mean square error) of the net through the training process is shown in Fig. 14a, as well as the gradient, the momentum μ, and the validation checks in Fig. 14b. The optimum performance was reached with 613 epochs (iterations), unless the training continued for 150 more epochs to avoid false minima (as the one found around epoch 400, as can be seen in the validation check figure at the bottom).

Fig. 13
Measured versus calculated P h (target) for the optimal ANN model with respect to: a the training set, b validation set, c testing set, and d complete set

Weights Analysis of the ANN
The weights in the ANN represent the amount of each input signal that will be transmitted to the output. The neural network weight matrix obtained for the studied case (Table 6) can be used to estimate the relative importance of the various input variables on the output variable. The relative importance of input variables is calculated using Garson's algorithm (GA) (Garson 1991) of common use in the neural network literature. Garson's algorithm employs the absolute value of weights for calculation. Figure 15 shows the relative importance of the different inputs using Garson's algorithm. GA shows a higher contribution from m 0 , UCS, and GSI. These results will be confirmed in the next section, where a sensitivity study is carried out.

Model Validation and Performance Measures
After training the model, a validation step should be performed to ensure, on one hand, its ability to generalize between the limits defined by the training data and, on the other hand, its robustness under a wide range of conditions (Shahin et al. 2008).
The simplest measure of performance is addressed using the already introduced statistical parameters as the correlation coefficient, r, the root mean square percentage error, RMSPE, and mean absolute percentage error, MAPE. This analysis was commented on in a previous section and is not repeated here.
However, validation against performance alone results in a network that may produce accurate predictions when using similar situations to the training data but may not be robust under different conditions. Shahin et al. (2005c) proposed to perform a sensitivity analysis to check the changes in the outputs when the inputs change. The robustness of the model is then analyzed examining how well model predictions are in agreement with the known underlying physical processes, that is to say, the changes in the predicted bearing capacity of shallow foundations on rock masses when changing the different parameters should agree with what is expected from the experience and other analytical and numerical methodologies.
The sensitivity analysis investigates the response of the neural network to a set of hypothetical input data that are generated over the ranges of the minimum and maximum data used to train the model. One variable at a time is changed over those ranges while all others are fixed to a certain chosen value. The process is repeated for each variable and different sets of synthetic inputs are generated by this procedure.
The UCS parameter is chosen as a reference for all the analysis since it is one of the most influential parameters on the foundation response and its influence is clearly known from the analytical formulation, which predicts that the bearing capacity grows proportionally to the UCS (Serrano et al. 2000).
Variation of results with m 0 and GSI against UCS are shown in Fig. 16, for the case of null dilatancy (Fig. 16a, c) and associated dilatance (Fig. 16b, d) keeping other parameters constant. It reveals that bearing capacity is highly dependent on the compressive strength of the rock UCS, increasing almost linearly as UCS increases. The growth of the bearing capacity with the UCS becomes more pronounced with a higher value of the parameters m 0 (Fig. 16a) and GSI (Fig. 16c) being the result much more sensitive to the variation of the GSI than of the m 0 , as reported in previous research (Serrano and Olalla 1994;Serrano et al. 2000;Merifield et al. 2006;Saada et al. 2008). The results presented for the case of null dilatancy (Fig. 16b, d) show a significant reduction of the bearing capacity relative to the associated dilatancy case (Fig. 16a, c). The reduction is more important for m 0 (Fig. 16b), reaching a 37% reduction for m 0 = 32, while for GSI (Fig. 16d) the reduction is about 24% for GSI = 85. Those results are consistent with the literature (Alencar et al. 2019). The influence of B and self-weight is presented in Fig. 17 since they are closely related. They are obtained for the reference case: m 0 = 12, GSI = 10, associated dilatancy, 2D case, rough contact. When a weightless rock is considered (Fig. 17a), no difference is shown in the results for different values of B, as expected from the analytical solution of the problem. If self-weight is introduced (Fig. 17b), a significant variation on the results appears, more evident for larger values of UCS, proving an important influence of B for the selected case (it is smaller for higher values of GSI).
Al figures show a coherent and smooth variation with the parameters, proving that the neural network offers a robust response in the input domain where it was trained.
The influence of the different hypotheses on the model (dilatancy, shape, roughness, and self-weight) are presented in Fig. 18, where an intermediate value of the parameters is chosen as reference (m 0 = 10, B = 11 m, GSI = 50) and each of the others is changed between 0 and 1 alternatively to address its influence on the results. In general, starting from the 2D case, with associate dilatancy, rough contact, and weightless rock (Dilat = 1, Sh = 0, R = 1, Wh = 0), the figure shows an important reduction in P h when dilatancy changes to null, as can be expected. A smaller reduction is obtained when the contact changes from rough type to roughness type since a rough interface rigidizes the rock under the foundation, and only a slight increment is observed when the self-weight of the rock is considered, showing the increment of work necessary to move the wedge of failure. On the contrary, an important increase in P h appears when an axisymmetric foundation is considered instead of the 2D one, which is coherent with the well-known fact in geotechnical engineering that the bearing capacity is smaller in the 2D one as theoretical results show. All results are consistent with previous research on the subject (Alencar et al. 2019).
The previous results confirm that the optimum network performs well and can be successfully used to coherently predict the bearing capacity of shallow foundations on rock masses.

Results and Discussion
The main result of this research is the ANN model that is conveniently converted into a set of simple equations to facilitate its use, being easily implemented in a spreadsheet.
The detailed equations are presented in the following, as well as some discussion about its scope and limitations. The optimal ANN previously developed has 8 inputs, 7 neurons in the hidden layer, and only one output, as shown schematically in Fig. 11.
The equations are obtained from the weights and biases provided by the Matlab software (applying to the inputs and one output the convenient scaling and re-scaling process), using the following expression (Ranasinghe et al. 2017): (7) Fig. 16 Variation of the bearing capacity P h . Relation between m 0 and UCS for the associated (a) and null dilatancy (b) cases. Relation between GSI and UCS for the associated (c) and null dilatancy (d) cases. Variable codes are dilat: dilatancy, Sh: shape, Rg: roughness, Wh: self-weight where y n is the single normalized output variable, representing the bearing capacity; k is the threshold value at the output layer and w kj is the connection weight between the jth node in the hidden layer and the kth node in the output layer; j is the threshold value of the jth hidden node and w ji is the connection weight between the ith input node and the jth hidden node; x ni is the ith normalized input variable; and f outp and f hidden are the transfer functions at the hidden and output nodes respectively. The previous expression (7) can be implemented recursively, considering that f hidden correspond to the tangentsigmoid function 2 1+e −2⋅t − 1 , and f outp to the linear function, as: Note that the inputs x ni in Eq. (9) are the normalized values. The weights and biases provided by the Matlab software corresponding to the optimal ANN are detailed in Table 6.
It is important to remember that reverse normalization should be applied to Eq. (8) to obtain the desired value of bearing capacity P h . Considering that the variables are normalized in the range (h min , h max ) = (0,1) the following expressions should be used: -Direct normalization of input values x i to normalized values x ni : -Reverse normalization of the normalized output value y n to the output value y: -Reverse log conversion of the output value y to obtain the final limit load: Being x max −i , x min −i the maximum and minimum value in the ith-parameter input data (x i ) used to train the neural network that can be found in Table 3.
From the results obtained in the previous section, the equation representing the optimal ANN (already including the double normalization previously explained) can be expressed as: where the different terms T 1 to T 7 are obtained using Eq. (9). As an example, the term T 1 is shown in full, being x ni the normalized (not original) input values: Using the previous equations, the trained ANN is easily reproduced and a very complex problem that would demand an elaborate model to be solved is straightforwardly addressed.
Some parameters as those corresponding to the modified Hoek and Brown criterion or dilatancy are not easily available in some software solutions, making the proposed ANN especially powerful and useful.
It is important to remark that the scope of the proposed ANN is restricted between the limits adopted for the input parameters. Predictions made outside those limits may not be accurate.
Although interpolation between input limits is available and the results will produce relatively accurate results, it should not be used for non-homogeneous inputs that not are associated with a real numerical value. A more detailed explanation is for the different cases: (10) (11) y = (y n − h min ) ⋅ y n−max − y n−min h max − h min + y n−min = 9.179(y n ) − 1.715.
-Parameters with a numerical value: m 0 , UCS (MPa), GSI, B (m) can be given any value between inputs limits. For example, for m 0 the adopted values in the net were (5/12/20/32) but any value between 5 and 32 can be used as input for new predictions. The use of (5/12/20/32) for training assures a higher precision. -Parameters that cannot be related homogeneously to a numerical value: o Self-weight. Although a 0/1 value was adopted for this parameter, it is related to 0 and 26 kN/m 3 real values. However, interpolation between those values is not recommended, since the predictions may not be accurate since only those two extreme cases were used to train the net. o Dilatancy. This case is different from self-weight because associated dilatancy does not correspond to a unique value but it depends on the stress level (angle of the tangent to the Hoek and Brown criterion at the point of failure). Interpolation for dilatancy values other than 0-1 may give inaccurate results.
-No interpolation corresponds to shape and contact roughness.
From the previous discussion, it can be followed that the case of a shallow foundation on a saturated rock with the phreatic level at the ground surface can be also studied using the network by assigning to the self-weight parameter the specific weight of the submerged rock.

Summary and Conclusions
This research deal with the calculation of the bearing capacity of shallow foundations on rock masses due to general shear failure. Different tools can be used to address the problem, having either excessive simplification (empirical equations, analytical solutions) or are quite complex and requires high expertise (numerical models). A different approach is available nowadays, neural networks and soft computing, that can overcome both limitations.
An artificial neural network (ANN) is created to predict the aforementioned bearing capacity of shallow foundations on rock masses, based on a large number of numerical calculations used as input data for training the net.
Since based on complex numerical calculations using FLAC software, it will have the same capacity that the substituted method if uses the same main representative parameters and assures a good accuracy of predictions by adequate building and training.
The inputs used in the ANN are three parameters characterizing the rock mass as rock type (m 0 ), uniaxial compressive strength (UCS), and geological strength index (GSI), the foundation width (B), the dilatancy on the failure surface, bidimensional or axisymmetric problem, rough or roughless foundation-rock contact, and weightless or self-weighted rock mass. The only output is the bearing capacity (Ph). Most professional foundation problems can be addressed properly using the previous set of parameters, even covering a wider range (as considering dilatancy).
Once inputs and outputs are defined, both building and training the net are required. The structure and the optimal size of the network are obtained by comparing the results for a net with a different number of hidden nodes (from 3 to 25), training functions, and transfer functions. Each architecture is trained and its predictions compared to the target values, analyzing the network performance, comparing predictions using the ANN with the target values, using the correlation coefficient, r, the root mean square percentage error, RMSPE, and mean absolute percentage error, MAPE. The final optimal architecture is a single hidden layer with 7 nodes, balancing that the previous global performance measures are in all cases around or below 5% and that the number of nodes is not very high. The individual prediction error is below 10% in most cases, never reaching a maximum of 20%.
This network can deal with similar problems as an elaborate numerical model and assures good accuracy. A sensitivity analysis is developed to ensure de robustness of the proposed model. Both show a very good agreement with the expected results proving that the model can be used to predict the bearing capacity. Its only limitation is the need to use input values within the limits defined to train the network as explained in the paper.
Finally, the neural network is converted into a set of simplified design equations to facilitate the use of the model through hand or spreadsheet calculations. By using these equations, no specific training or experience is needed to solve the problem (unlike with numerical methods) beyond the necessary knowledge to properly define the input parameters and correctly interpret the obtained bearing capacity.

Appendix: Illustrative numerical example
To illustrate the detailed use of the previous bearing capacity equation, a simple case of a shallow foundation on a rock mass having m 0 = 20, B (m) = 11, UCS = 50 MPa, and GSI = 85. We also consider dilatancy 1 (associated), plane strain (0), rough contact (1), and self-weighted rock (1).
The following steps should be followed: The input data should be scaled to the (0, 1) range, using Eq. (10): Being x ni and x i the obtained normalized value and the original value, respectively; x, x max−i and x min−i the original, maximum and minimum values of the original data, respectively.
The original and normalized values for the studied case are shown in Table 7.
Obtain the final limit load from Eq. (13) (this step includes the reverse normalization of the normalized output value y n , from the final output neuron to the output value y , and the reverse log conversion of the output value y to obtain the final limit load): The predicted result offers a 3.4% error compared to the target value, equal to 402.7 kN. Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Availability of data and material (data transparency) Not applicable.
Code availability (software application or custom code) Not applicable.

Conflict of interest
There is no conflict of interest to declare that are relevant to the content of this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. P h = exp 9.17 × −0.1108 × T 1 − 6.1851 × T 2 +8.0001 × T 3 + 0.