Application of machine learning to the identification of quick and highly sensitive clays from cone penetration tests

Geotechnical classification is vital for site characterization and geotechnical design. Field tests such as the cone penetration test with pore water pressure measurement (CPTu) are widespread because they represent a faster and cheaper alternative for sample recovery and testing. However, classification schemes based on CPTu measurements are fairly generic because they represent a wide variety of soil conditions and, occasionally, they may fail when used in special soil types like sensitive or quick clays. Quick and highly sensitive clay soils in Norway have unique conditions that make them difficult to be identified through general classification charts. Therefore, new approaches to address this task are required. The following study applies machine learning methods such as logistic regression, Naive Bayes, and hidden Markov models to classify quick and highly sensitive clays at two sites in Norway based on normalized CPTu measurements. Results showed a considerable increase in the classification accuracy despite limited training sets.


Introduction
One of the primary concerns in the majority of construction projects in Norway is the presence of highly sensitive or quick clays, which significantly affects the feasibility of such projects. As cone penetration tests with pore water pressure measurement (CPTus) are widespread and present in almost every geotechnical exploration program in Norway, it is convenient to determine whether a soil profile contains quick clays based on the CPTu test results. The use of CPTu for soil classification is a common practice, particularly using the well-known classification charts found in (Lunne et al., 1997). However, a major challenge comes to light when the soil deposits comprise non-textbook soils, as in the case of quick or highly sensitive clays. In these cases, alternatives should be determined to maintain the convenience of using indirect field measurements without expending a large amount of resources.
In this context, the use of machine learning approaches is ideal as local data can be used to train a model to learn how the measured data characterize a certain kind of soil. With little information, results will not be satisfactory; however, as the exploration advances, the model will learn from the newly obtained data and adjust itself to provide better results.
This study investigates the potential of machine learning techniques to improve the identification of highly sensitive and quick clay soils using CPTu. All computations performed within this study use Python (van Rossum, 1995) as the programming environment. Machine learning algorithms used are logistic regression and Naive Bayes, as programmed in the scikit-learn library (Pedregosa et al., 2011), and the hidden Markov model (HMM), available in the hmmlearn library (hmmlearn, 2010).
The methodology followed considers the analysis of two CPTu datasets from previous studies at sites wherein highly sensitive and quick clays were encountered and wherein the layering (lithology) at each test location is known. The CPTu data were then used to classify the samples using well-known classification charts and machine learning methods. Finally, the results were compared against the actual layering, and performance measurements were computed to compare the different approaches.

Norwegian geo-test sites dataset
The Norwegian geo-test sites (NGTS) dataset is a research consortium led by the Norwegian Geotechnical Institute (NGI), with the participation of the Norwegian University of Science and Technology and other organizations. Its main focus is to develop field laboratories for the testing, verification, and control of new methods and equipment for site investigations and foundation engineering (NGI, 2019). Within the NGTS framework, an important study subject is quick clays, for which the site at Tiller (Trondheim, Norway) was chosen. Fig. 1 presents the location of the CPTus, while Figs. 2 and 3 show the summary of the tests alongside the layering of the site. In this study, 31 CPTus were used (CPTu C18 was discarded due to high sleeve friction (f s ), which was not representative of the site).
The layering of the site consisted of 2.0 m of dry crust, followed by a clay layer up to a depth of 7.5 m on top of a quick clay layer of 12.5 m thick. The water table was 1.5 m from the surface (a hydrostatic condition was assumed). The terrain was flat, so the features described above were expected to have few variations over the study area. The lithology was based on previous studies conducted at the site, including soundings and laboratory tests. Detailed information about this can be found in (L'Heureux et al., 2019).

Vegvesen dataset
This dataset consisted of seven CPTus that were part of the studies for the construction of a bridge on part of County Road 715 (Fv. 715) in Trøndelag County, Norway. The bridge foundations were planned to be placed in an area at high risk for quick clay slides. The soil layering at the site was not as regular as that at the NGTS site; the common sequence was a stiff upper layer followed by clay on top of a thick quick clay layer, and subsequent to a certain depth, there appeared clay or stiffer soil layers.
The location of the tests is shown in Fig. 4 (p.448), while the layering and water table depth are shown in Fig. 5 (p.449)(a hydrostatic condition was assumed for the groundwater). It is important to note that in this case, the layering was proposed by the authors based on the information from the site data report by Statens Vegvesen (2013), which included laboratory and field tests. The summary of the tests is shown in Figs. 6 and 7 (p.449).

Data processing
The CPTu data were received as raw files in ".cpt" format, comprising measurements of the depth, tip resistance (q c ), f s , and pore pressure behind the cone (u 2 ). The tip resistance value was corrected from the effects of the pore pressure acting at the conical tip using the following formula: q t = q c + (1 + r a ) · u 2 , where r a is the net area ratio dependent on the probe design and q t is the corrected tip resistance. The normalized parameters were then computed according to the following equations.
In the present study, for the machine learning classification, logarithmic transformations were performed over the normalized parameters to fit the data in the (0, 1) range. U 2 was preferred over B q , as it is a better pore pressure parameter for soil type identification according to Schneider et al. (2008). The transformed normalized parameters, Q norm t , F norm r , and U norm 2 , are presented below: To have a point with which to compare the machine learning approach, the classification was first performed using well-known charts that consider sensitive soils in their classification schemes. Charts used were those recommended by Robertson (1990Robertson ( , 2016, Eslami and Fellenius (1997), Schneider et al. (2008), and Gylland et al. (2017). The metric used to evaluate the accuracy was the accuracy score (A.S.), defined as follows: A.S. = Number of correctly classified samples Total number of samples .
As this part of the work was focused on predicting the appearance of highly sensitive and quick clays from the CPTu measurements, only three soil classes were considered: sensitive, clayey, and other (coarser or stronger). The classification results were consequently adjusted to measure the classification accuracy.
The results of using this classification chart are shown in Figs. 8 and 9. For comparison, in this study,  (Robertson, 1990) showing the Vegvesen dataset: (a) Q t -Fr; (b) Q t -Bq. References to color refer to the online version of this figure soil class 1 was considered as quick and sensitive clay, soil classes 3 and 4 as clayey, and soil classes 2, 5, 6, 7, 8, and 9 as other. The accuracy scores of this classification using the study datasets are shown below: Vegvesen : A.S. = 28%, for Q t -F r chart, From the chart, it is evident that the classification results for the Vegvesen dataset had a low accuracy score because of the high Q t of the site's sensi-tive clays compared with the zone defined by Robertson (1990). However, the results for the NGTS site showed better agreement with the chart, especially the Q t -F r plot.

Eslami and Fellenius (1997)
This classification chart was developed when investigating the use of cone penetration test (CPT) in pile design using data from 20 sites in five countries. In this case, the "effective" cone resistance and f s values are used instead of the normalized ones. The effective cone resistance is defined as q E = q t − u 2 . The chart defines five classes: (1) sensitive and collapsible clay and/or silt; (2) clay and/or silt; (3) silty clay and/or clayey silt; (4) sandy silt and/or silty sand; (5) sand and/or sandy gravel.
The results of using chart in (Eslami and Fellenius, 1997) with the datasets in this study are shown in Figs. 10a and 10b. Better agreement was observed in the identification of sensitive soils for the Vegvesen dataset compared with Robertson (1990). In both datasets (though clearer in that of NGTS), it was possible to see a major overlap between the clayey and quick clay soils. The accuracy scores were 63% for NGTS and 74% for Vegvesen.

Schneider et al. (2008)
The work performed by Schneider et al. (2008) focused on improving the simple classification charts available at that time to consider the effects of undrained penetration on penetration resistance. The chart was plotted on a Q t -U 2 space. The database used in this study included sensitive soils from Norway and Canada. The classification chart is divided into five different zones: (1a) silts and low-rigidity-index (I r ) clays, (1b) clays, (1c) sensitive clays, (2) essentially drained sands, and (3) transitional soils.
Figs. 10c and 10d show both datasets plotted on the Q t -U 2 space. It was observed that the sensitive clays from the Vegvesen dataset showed a behavior closer to that predicted by the scheme, with an accuracy score of 75%. Meanwhile, in the NGTS dataset, there were more cases of "false positives" meaning that a large fraction of the clay layer was classified as sensitive when it was not; however, the accuracy score was still 75% as well.

Robertson (2016)
This chart updates that proposed by Robertson (1990) by using behavior-based descriptions and an updated normalized tip resistance Q tn . Seven zones are defined in this classification system: (1) claylike contractive sensitive (CCS); (2) clay-like contractive (CC); (3) clay-like dilative (CD); (4) transitional contractive (TC); (5) transitional dilative (TD); (6) sand-like contractive (SC); (7) sand-like dilative (SD). Fig. 11 shows both datasets plotted on the Q tn -F r space. Once again, the NGTS dataset presented a large fraction of the clayey soil as sensitive, but the other two soil classes seem to fall well within the correct groups. Meanwhile, the Vegvesen dataset showed a large fraction of sensitive soils classifying as TC or CC. The accuracy score was 75% for NGTS, while 52% for Vegvesen.
3.5 Gylland et al. (2017) This work proposes a classification chart specifically focused on the identification of sensitive clays and is based on tests performed in Norway using parameters following the same philosophy as Robertson (1990) but with a different normalization: N mc , B q1 , and R fu , as shown below: where in the reference stress , σ c is the effective pre-consolidation stress, a is the attraction, m is the SHANSEP-framework exponent (typically between 0.7 and 0.8 for Norwegian clays), and Δu 1 is the excess pore pressure at the tip of the cone (u 1 ).
The main drawback of this scheme is that it is necessary to know parameters that are not necessarily associated with the CPT itself, e.g. the attraction and pre-consolidation stress. Moreover, it requires the knowledge of u 1 , which is not usually measured, preferring u 2 . Therefore, in this case, it was necessary to use correlation involving the measured parameters to estimate those from the model.
Figs. 12 and 13 (p.453) show both datasets plotted on the chart proposed by Gylland et al. (2017). It was evident that the NGTS dataset showed better agreement than the Vegvesen dataset. The accuracy score here was only a binary classification score due to the nature of the classification proposed by the authors. The results are summarized below: NGTS : A.S. = 86%, for N mc -B q1 chart, A.S. = 87%, for N mc -R fu chart; Vegvesen : A.S. = 39%, for N mc -B q1 chart, A.S. = 40%, for N mc -R fu chart.
The Vegvesen dataset was plotted almost completely out of the red shaded area defining sensitive clays, demonstrating a different behavior of the sen-sitive clays present in the area compared with those that were part of the dataset used by the authors, which included the NGTS site at Tiller.
Other Clayey Sensitive-quick Fig. 10 Classification in (Eslami and Fellenius, 1997) showing the NGTS (a) and Vegvesen (b) datasets, and classification in (Schneider et al., 2008) showing the NGTS (c) and Vegvesen (d) datasets. References to color refer to the online version of this figure

Machine learning classificators
The machine learning algorithms for the classification used in this work were logistic regression, Naive Bayes, and an HMM, as included in the Python libraries scikit-learn (Pedregosa et al., 2011) and hmmlearn (hmmlearn, 2010). Briefly, logistic regression uses a linear model to create a decision boundary that separates different classes, and the Naive Bayes approach uses a probabilistic framework based on Bayes' theorem, with simplification of the conditional independence of the data. Finally, an HMM uses Markov chains and a probabilistic framework to model the spatial correlation between measured data. The measurement of the model's per-formance was performed through the accuracy score introduced previously.

Logistic regression classifier
A logistic regression classificator was sequentially trained, and results of the predictions are shown in Fig. 14 , and the three parameters together as predictors.
Results showed a high accuracy score, even for the first estimation (using only one CPTu to train), with a sharp increase afterwards. It was observed that accuracy scores of at least 80% were reached upon using only four tests to train the classificator. It was also noted that the Q norm t -U norm 2 scheme showed higher accuracies with fewer data. Furthermore, the It is important to highlight that the NGTS site is a highly homogeneous site with a regular layering sequence and low dispersion of the measured parameters. These results should not be expected to occur in non-homogeneous sites.

Naive Bayes classifier
Results of using a Naive Bayes classificator on the NGTS dataset are shown in Fig. 15. It was observed that the results showed more scattering in the accuracy of the first estimation, especially when using F norm r , but quickly increased afterwards. The primary advantage of using the Naive Bayes approach compared with logistic regression is that the run time is around ten times less than that of the latter, which may make a major difference in large datasets.

HMM
For this part, an HMM was trained in a semisupervised manner. The model parameters (transition matrix, means, covariances, and starting probabilities) were estimated from the training data. Then, the model was allowed to update (optimize) the values of the transition matrix and covariances in the expectation-maximization stage, while the rest remained fixed. The Viterbi algorithm was used to determine the most likely sequence of states (soil classes). Sequential training was performed, but due to a restriction in the programmed code, it was only possible to use the CPTu test that defined the three soil classes. This restriction reduced the number of combinations available, but it was assumed that there were still enough to draw conclusions from. The results of the sequential training and classification are shown in Fig. 16.
It was observed that despite the prediction being less accurate with few data, it quickly increased, yielding results similar to those obtained using the other two methods. The advantage of using an HMM is that because it considers the likelihood of changing from one class (hidden state) to another, the predicted profiles do not have unrepresentative thin layers within one another. This can be seen in Fig. 17 for CPTus C04 and C10.

Site profiles
To visually compare the different classification methods, Figs. 17 and 18 display several selected CPTus with the actual site layering alongside the machine learning classification. The accuracy of each profile estimation is presented in Table 1.
For the profile estimation, only seven CPTus were used for the training to classify each test. The criteria used to select the training dataset were to sort the tests by name and use the seven closest to the CPTu to be classified. The median accuracy scores were 96% for the logistic regression, 97% for the Naive Bayes, and 96% for the HMM. These values were quite similar to each other, but the mean error was considerably higher for the HMM (Table 1). A visual comparison showed that the HMM estimated profiles that were not close to reality, as in the case of C05, which may have been due to the training set chosen and could be improved via including more data in the training phase. For NGTS, the machine learning classification that performed best was the Naive Bayes, closely followed by the logistic regression. Table 2 shows the accuracy score for each soil type. In this case, the identification of quick and sensitive clays was higher than that of the others (which were still high, with accuracy scores over 80%). Table 2      demonstrates the relatively worse performance of the HMM.

Results of the Vegvesen dataset
Since the Vegvesen dataset was much smaller than the NGTS dataset and the layering of the site was not as homogeneous, its machine learning classification was more challenging because in the process of splitting the dataset to train and test, a major share of the information was lost (it was not possible to use it to train). Thus, in this case, in addition to performing the sequential training and prediction, a "leave-one-group-out" cross-validation technique (Scikit-learn, 2019) was used to assess the performance of the classifiers. Here, a group was represented by a CPTu.

Logistic regression classifier
Results are shown in Fig. 19. More scattered behavior and less accuracy were observed in general compared with the NGTS site; however, given the fact that this dataset had fewer CPTus and the soil layering was more complex, the results were good. Compared with the classification charts, the logistic regression classifier improved the results. This was even more evident when analyzing the crossvalidation results shown in Table 3.

Naive Bayes classifier
Results are shown in Fig. 20. As with the logistic regression, these results were more scattered and less accurate, but upon using the remaining six tests for the training, the accuracies increased considerably, as shown in Table 4. In the case of the Q t -F r classification, the results showed a median accuracy score of 91%, which was much more accurate than any of the classification charts.

HMM
The results of the sequential training and classification are shown in Fig. 21, while the results of the cross-validation are shown in Table 5.
The results showed high accuracies that were generally reached after using four tests to train the model.

Site profiles
Results of the profiles estimated by the machine learning classification of the CPTus from Vegvesen dataset are shown in Figs. 22 and 23. The median accuracy scores were 79% for the logistic regression, 91% for the Naive Bayes, and 87% for the HMM. Here, as in the NGTS dataset, the Naive Bayes classification displayed the highest accuracy and the lowest mean error, as shown in Table 6. Table 7 shows the accuracy scores for the classification of each type of soil considered. In this case,   there was a high accuracy for the highly sensitive and quick clay identification but a low accuracy for the other soil types. This may have been a reflection of the heterogeneity of the soil investigated compared with that of the NGTS dataset. Here, under the label "Other", there might have been a more varied range of soils.

Discussion
In general, soil classification charts meant for broad use fail to capture special soils like quick or  This study used three machine learning approaches to study a methodology to improve the sensitive soil determination of CPTus: logistic regression, Naive Bayes, and HMM classifications.
The approaches were tested in two different datasets comprising soils of different characteristics. In the case of the NGTS site, wherein the soil layering was regular and the layer characteristics were homogeneous, the three methods displayed an excellent performance, as measured by an accuracy score well above 90%. Although the three methods had a high accuracy, that of the HMM was slightly lower than the other two, particularly in two profiles, C05 and C17, as shown in Table 1 and Fig. 17. Advantages of an HMM are that it considers the spatial connection between the data and can be noticed in less presence of thin unrepresentative layers. The Vegvesen dataset had fewer data available to train the models, but the results were still quite favorable, with accuracy scores above 80%. The sequential training of the models graphically demonstrated how they learned from the data as it was incorporated. Both datasets showed that using only four tests to train the model significantly improved the accuracy of the classification. This could be helpful when performing site investigations wherein the model can be trained as the data are retrieved from the soundings and laboratory tests or even when in-place visual classification is performed. Once the model is successfully trained, the necessity of laboratory tests might be reduced. This could work even in less homogeneous datasets, such as that from Vegvesen. Since this study primarily focused on comparing different classification approaches and discussing the advantages of following a machine learning classification approach, a simple performance measure was used: the accuracy score. It is recommended that a better performance measure, e.g. the balanced accuracy score or the F1 score (Scikit-learn, 2019), should be used for further research on this topic. It would also be interesting to explore other more advanced machine learning classification approaches, such as ordinal classifications or neural networks. Additionally, for further research on this topic, more data representing a broader range of soil properties are required, with clear identification of clays as sensitive, highly sensitive, or quick. Moreover, the authors encourage the incorporation of recent models aiming at identifying quick clays. A good example of such a study is in (Valsson, 2016).

Conclusions
In this study, a set of Python libraries were used to train three different machine learning algorithms: logistic regression, Naive Bayes, and an HMM. Results demonstrated the abilities of these methods to learn from the data, with good classification accuracies reached after only four training CPTus. However, these methods are not meant to be used as general classification solutions unless they are trained with a large dataset that includes different soil conditions to be detected. The major challenge here involved obtaining enough data to train the models as well as enough laboratory results to verify them.
In the future, it would be interesting to keep researching these methodologies and their applications in the geotechnical field since they have proven to yield good results that can aid engineers in optimizing both field and laboratory tests.