Keyword

1 Introduction

Loess is a loess-like accumulation formed by a combination of factors under arid or semi-arid conditions during the Quaternary period [1]. Loess is less compressible and has a certain degree of strength under natural moisture conditions. However, when loess is under the joint action of water and vertical stress, its structure will be destroyed rapidly, and wet subsidence occurs.

In recent years, with the rapid development of microscopic technology, scholars at home and abroad use microscopic technology to study the microstructure of soil. Lei Xiangyi, Dibben S C et al. [2, 5] used microscopic techniques to evaluate the relationship between pore type and wet subsidence of loess. Lin Guanghua, Tovey et al. [3,4,5,6] used microscopic techniques to study the microstructure of loess and rated the effect of labile salts, pore characteristics etc. on the wetting properties of loess.

For the wet subsidence model, the main construction method in China is to fit the empirical formula with the data obtained from the wet subsidence test. For example, the relationship between the physical property index of loess and the coefficient of wet subsidence was constructed by using the least squares method [7], linear fitting [11] and other ways. In order to further improve the prediction effect of empirical formulas, Lingxia Gao and Ping Li [12] constructed the multifactor regression equations between loess wet subsidence coefficient and water content, plasticity index, and pore ratio by using principal component analysis. Meanwhile, with the development of computer performance, the high computing power provided by computers is used for data mining to further improve the prediction accuracy of the wet subsidence model. For example, BP neural network [8] and fuzzy algorithm [9, 10] are used to predict the coefficient of loess wet subsidence, which further reduces the prediction error of the model and makes it more valuable for engineering applications.

In order to make the selected physical property indexes have strong independence and improve the prediction accuracy of the model, this paper adopts factor analysis to screen the main factors and uses genetic algorithm to optimize the initial network parameters of the BP neural network, to achieve the purpose of improving the prediction accuracy.

2 Factor Analysis Based on Physical Properties Indicators of Loess

Due to the complexity of loess wet subsidence mechanism, the physical indicators affecting the loess wet subsidence coefficient are not completely independent. If the correlation of the selected indexes is too high, the BP neural network model constructed will be overfitting; and if the selected indexes are too few, the accuracy of the model will be reduced. Therefore, this part will be based on the theory of wet subsidence deformation of loess, select the characteristic indexes, use principal component analysis to screen out the relatively independent physical property indexes, then build the BP neural network, and use genetic algorithm to optimize the weights and thresholds of the BP neural network in order to achieve the purpose of improving the accuracy.

2.1 Selection of Variable

According to the “Wet Loess Area Construction Standard” (GB50025-2018), when the depth of the overlying soil layer is less than 10 m, the vertical stress is set at 200 kPa to test the wetting coefficient; when it is greater than 10 m, the wetting coefficient is tested according to the saturated self-weight stress of the overlying soil body. In this paper, the wetting coefficient under the vertical stress of 200 kPa is selected, i.e., the vertical stress is considered as an invariant value, and the data with wetting coefficient less than 0.015 are excluded. According to the relevant literature, the analysis may be related to the following factors: Pore ratio and dry density and Initial moisture content and degree of saturation and Plastic limit, liquid limit and plasticity index and so on.

2.2 Factor Analysis

To standardize the data in the database to achieve the consistency of the physical indicators in terms of scale and order of magnitude, and then calculate the correlation coefficient matrix of the physical indicators to determine whether there is a strong correlation between the physical indicators with the help of the correlation coefficient matrix of the physical indicators. The number of factors is selected according to the variance contribution rate and cumulative variance contribution rate of each factor, and if the correlation between the obtained factors and the physical indicators is not very strong, it is necessary to rotate the factors to get the main factors with high correlation with the physical indicators. This paper is based on SPSS software to factor analyze the data of physical indicators, the specific steps are shown below.

Determination of the Correlation Matrix

By organizing the test data under the vertical stress of 200 kPa in this project with the data collected in the literature with a total of 120 sets of test data [9,10,11,12], and 60 sets of data were selected for correlation analysis:

Based on the standardized physical property index data set, the correlation coefficients between the variables were calculated, as shown in Table 1. According to the correlation coefficient matrix, there is a good correlation between moisture content and saturation; a strong correlation between dry density and pore ratio; and a strong correlation between liquid limit, plastic limit, and plasticity indexes.

Table 1. Matrix of correlation coefficients among physical property indicators

Factor Determination

Based on the correlation analysis of the collected data, factors are established to downsize the original data structure. The importance of each factor was measured based on the variance contribution rate and cumulative variance contribution rate, and then the factors were selected, as shown in Table 2. The number of factors was determined to be 3, and the cumulative variance contribution rate of the factors was 99%.

Table 2. Cumulative variance contribution of factors

Factor Rotation

In this paper, the maximum variance rotation method is used to rotate the factor loading matrix. Through the factor analysis after the rotation of Table 3, the factors that have a strong influence on the first factor are liquid limit, plastic limit and plasticity index; the second factor is mainly related to the pore ratio and dry density; and the third factor has a strong connection with water content and pore ratio. Through factor analysis, this paper selects plasticity index, pore ratio and water content to replace all the physical property indexes.

Table 3. Factor loads before and after rotation

3 Wet Trapping Modeling

Based on the principal component analysis of the dataset, the water content, pore ratio and plasticity indexes are taken as the principal components, to achieve the relative independence among the indexes as well as the purpose of reducing the dimensionality of the data structure.

3.1 Construction and Optimization of Wet Depression Model

Introduction to BP Neural Networks

Compared with other traditional linear regression methods, BP neural networks can use data to train the network parameters to establish the mapping relationship between variables and predicted values, and there is no need to describe this relationship beforehand, so BP neural networks are very suitable for nonlinear regression neural networks.

Nonlinear mapping ability: BP neural network essentially achieves a mapping function from input to output, and mathematical theory proves that a three-layer neural network can approximate any nonlinear continuous function with arbitrary accuracy. This makes it particularly suitable for solving problems with complex internal mechanisms, where BP neural networks have strong nonlinear mapping capabilities.

Self learning and adaptive ability: During training, BP neural networks can automatically extract “reasonable rules” between output and output data through learning, and adaptively memorize the learning content in the network's weights. BP neural network has high self-learning and adaptive capabilities.

Generalization ability: The so-called generalization ability refers to the ability of the BP neural network to apply learning results to new knowledge when designing a pattern classifier, which not only considers whether the network can correctly classify the required classification objects, but also whether the network can correctly classify unseen patterns or patterns contaminated with noise after training.

Fault tolerance: The BP neural network does not have a significant impact on the global training results when its local or partial neurons are damaged, which means that even if the system is damaged locally, it can still work normally. The BP neural network has a certain degree of fault tolerance.

Establishment Steps

In this paper, a BP neural prediction model based on database [9,10,11,12] is developed with reference to experimental data and collected data, and the specific steps are shown below:

Input layer, output layer establishment: this paper will take the water content, pore ratio, plasticity index as the characteristic value of the input layer, so the number of nodes in the input layer is 3. There is a strong independence between the indexes, and it can comprehensively evaluate the whole data structure. The output layer is one layer, and the number of nodes is 1.

Hidden layer establishment: in general, increasing the number of layers of the hidden layer can improve the accuracy of the evaluation results, but the increase in the number of hidden layers will lead to the network training time is too long, so in this paper, we will establish a layer of hidden layer, the number of nodes in the hidden layer is determined by the following empirical formula 1:

$$ n_{1} = \sqrt {n + m} + \alpha { } $$
(1)

where: n1 - number of nodes in the hidden layer; n - number of nodes in the input layer; m - nodes in the output layer; α - node number modifier, chosen between [1, 10].

Indicator Data Normalization: In order to keep the input data within the specified range to ensure that the neural network can converge and reduce the training time, the selected indicator data needs to be normalized before modeling. In this paper, the sample data will be normalized so that the range is between [−1, 1].

Network training: through the weights and thresholds between different layers, connect the input layer, hidden layer, and output layer, train the network through forward transfer, utilize the backward propagation of error, continuously update the weights and thresholds between layers, make the prediction results of the network closer to the expected value, and finally use part of the data for the validation of neural network prediction model.

3.2 GA Optimization Modeling

Genetic Algorithm (GA) is an intelligent optimization algorithm proposed by American professor J. Holland in 1975, which is a global optimization algorithm simulated based on biological genetic evolution.

Genetic algorithms refer to the phenomenon of selection inheritance, gene crossover, and gene mutation in Mendelian genetic theory, which has the characteristics of global optimization, parallel processing, and high generality. Firstly, a set of optimal solutions are selected not to be eliminated during each iteration, and then the solutions of the fitness function are used to select the excellent individuals, and the selected individuals are reorganized using the genetic operator to form a new population. The new population formed has evolved compared to the previous generation, and the iteration can be terminated after reaching the preset goal through one iteration.

Due to the stochastic nature of the initial parameters of the BP neural network, the optimization of the network parameter configurations using genetic algorithms can minimize the prediction set error.

Initial Population Creation

Firstly, all the parameters are flattened into the form of chromosomes, and all the values and thresholds in the BP neural network are selected to form an N-dimensional vector (N = input vector dimension × number of implied layers + number of implied layers + number of output vectors × number of implied layers + number of output vectors) as an individual chromosome. The initial population size is set to M = 10, which in turn randomly generates an M × N matrix as the initial population.

In the loess wet subsidence model of this paper, the input vector dimension is 3, which are water content, pore ratio, and plasticity index, and the output vector dimension is 1, which is the wet subsidence coefficient.

Heredity, Crossover, Variation

In this paper, the roulette algorithm is selected as the selection operator. The roulette algorithm is able to pass on individuals with higher fitness function values to the next generation, so in each iteration, firstly, the fitness function value of each individual in the current population needs to be calculated; then the best and the worst fitness individuals are saved into the preset space, and the selection probability of the other individuals in the population is calculated; finally, the individual selection is performed according to the selection probability of each individual, and the chosen individuals to replace the individuals in the original population. After completing the crossover and mutation, the fitness of everyone in the current population is calculated, and the individual with the best fitness in the previous generation is used to replace the individual with the worst fitness in the current population.

In order to maintain a diverse population, two individual structures in the previous generation need to be partially replaced and reorganized to form a new individual, a process called crossover. In this paper, the two-point crossover method is chosen as the crossover operator, and the crossover probability is set at 0.8.

In order to make the genetic algorithm have the ability of local search and maintain the diversity of the population, the genetic algorithm introduces the mutation algorithm to generate new individuals. In this paper, Gaussian variation is chosen as the variation algorithm. The mutation probability is 0.2.

Termination Conditions

The algorithm terminates when the optimal individual fitness reaches a minimum error of 0.00001, with a preset number of generations to terminate at 100.

3.3 Model Accuracy Validation and Result Analysis

In this paper, the model program will be established based on MATLAB 2020b software, and the BP neural network will be written together with the genetic algorithm program.

Relative Error Analysis

The range of the number of hidden layer nodes of the BP neural network is [3, 11], Training in the selected range of hidden layers, when the number of hidden layer nodes is 12, the genetic algorithm optimized model has the highest accuracy, and its relative error is 5.81%, currently, the relative error of the BP neural network prediction model is 32.71%. The accuracy of the prediction model of loess wet subsidence coefficient optimized by genetic algorithm is improved by 30%, and the error is less than 20%, which achieves the expected effect. When the hidden layer node is equal to 12, the specific prediction data are shown in Table 4.

From Table 4, the BP neural network for the medium wetted loess prediction accuracy is poor, but after the optimization of the genetic algorithm, the error is within 12.19%, the prediction accuracy of the reliability of the higher.

Table 4. Prediction data table for hidden layer node number of 12

When the number of hidden layer nodes is 7, the prediction accuracy of the BP neural network prediction model is the highest, and the average relative error is 19.49%, currently, the average relative error of the prediction results of the genetic algorithm optimization model is 9.39%. The specific prediction data is shown in Table 5. From Table 5, the relative error of the BP neural network prediction values is all within 34.62%, and the prediction model has a large error, but after the genetic algorithm optimization of the BP neural network prediction model, excluding the first group, the other data prediction error is within 17.83%, and the prediction accuracy of the model optimized by the genetic algorithm is improved by 10%.

When the hidden layer node is too low, the prediction accuracy is poor for non-wetted loess, which is excluded in this paper to study the wetting coefficient when wetting occurs in wetted loess. After the exclusion of the GA optimization model prediction set of the prediction value error within 17%, the error is less than 20%, the accuracy reaches the expected accuracy.

Table 5. Prediction data table for hidden layer node number of 7

3.4 Analysis of Results

For the BP neural network optimized by genetic algorithm, the optimal initial network parameters can be selected for training in the iterative process. Therefore, with the increase of the number of hidden layer nodes, the accuracy of the optimized neural network will be further improved.

The prediction analysis shows that the genetic algorithm can make up for the decrease in prediction accuracy due to the randomness of the initial network parameters. When the number of hidden layer nodes is 12, the relative error of the prediction model is within 12.19%, and all kinds of errors are reduced to different degrees, the prediction accuracy is reliable, and it has a good engineering practical value.

4 Conclusions

  1. (1)

    Using the factor analysis method, the water content, pore ratio, and plasticity indexes were selected as factors based on the variance contribution rate and cumulative variance contribution rate, and the prediction model of the wet subsidence coefficient of loess and physical property indexes was established by using BP neural network.

  2. (2)

    As the genetic algorithm can optimize the initial network parameters of the BP neural network, the GA-BP neural network prediction model has a higher prediction accuracy compared with the BP neural network during the training process, which is closer to the real value, and has a strong engineering practicability.

  3. (3)

    The GA-BP neural network optimization model created in this paper has the highest accuracy when the number of nodes in the input layer is 3, the number of nodes in the output layer is 1, and the number of nodes in the hidden layer is 12, and the average relative error of the predicted data is 5.81%.

  4. (4)

    Due to the ability of genetic algorithm to optimize the initial network parameters of BP neural network, during the data training process, the GA-BP neural network prediction model has higher prediction accuracy compared to BP neural network, and can approach the true value more closely, which has strong engineering practicality. The modeling steps for the surrounding rock and soil layers of the reference tunnel, as well as the modeling results of the surrounding terrain and topography, are consistent with the internal volume method, and will not be repeated here.