Abstract
The cone penetration test (CPT) is widely used in soil characterization and the determination of physical parameters. The traditional interpretation of the results of CPTs relies on the experience and expertise of geotechnical engineers. However, recent innovations in machine learning have led to the development of predictive models that can accurately predict soil properties based on CPT data. The application of these techniques can provide more accurate and consistent predictions, even for complex soil conditions. Therefore, this article sought to evaluate the performance of two different machine learning algorithms, random forest and deep learning, with CPT test data to predict tip resistance (qc) and sleeve resistance (fs) based on soil classification inputs. The work was conducted with a database of tests in the regions of Germany and Austria, initially consisting of more than two million related observations. This allowed for an assessment of model generalization across different regions. The random forest regressor algorithm presented a coefficient of determination of 0.94 for tip resistance (qc) and 0.82 for sleeve resistance (fs) prediction, thus outperforming deep neural networks. The study applied the model to obtain coefficients of determination between 0.65 and 0.68 for tip resistance (qc) and 0.14 to 0.75 for sleeve resistance (fs) for different regions of testing. Practical implications include the possibility of obtaining the design parameters qc and fs from inputs obtained from simpler tests, which would reduce project costs, improve the quality and efficiency of CPTs, and assist in making decisions about geotechnical projects.
Similar content being viewed by others
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Also, the Python script are also available in the following Github repository: https://github.com/vlpacheco.
Abbreviations
- AI:
-
Artificial intelligence
- CNN:
-
Convolutional neural networks
- CPT:
-
Cone penetration test
- CPTu:
-
Cone penetration test with pore pressure measurement
- DNNs:
-
Deep neural network
- fs :
-
Sleeve resistance
- GPUs:
-
Graphics processing unit
- kPa:
-
Kilopascal
- MAE:
-
Mean absolute error
- MAPE:
-
Mean absolute percentage error
- ML:
-
Machine learning
- MPa:
-
Megapascal
- MSE:
-
Mean square error
- qc :
-
Tip resistance
- R2 :
-
Coefficient of determination
- RF:
-
Random forest
- RFE:
-
Recursive feature elimination method
- RFR:
-
Random forest regression
- RMSE:
-
Root mean square error
- SBT:
-
Soil behavior type
- SBTn:
-
Normalized soil behavior type
- SCPT:
-
Seismic cone penetration test
- SCPTu:
-
Seismic cone penetration test with pore pressure measurement
- SPT:
-
Standard penetration test
References
Abadi M, Barham P, Chen J, et al (2016) Tensorflow: A system for large-scale machine learning. In: 12th Symposium on Operating Systems Design and Implementation. pp 265–283
Adusumilli S, Bhatt D, Wang H et al (2013) A low-cost INS/GPS integration methodology based on random forest regression. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2013.02.002
Ahmadi MM, Golestani Dariani AA (2017) Cone penetration test in sand: a numerical-analytical approach. Comput Geotech 90:176–189. https://doi.org/10.1016/j.compgeo.2017.06.010
Ahmadi MM, Byrne PM, Campanella RG (2005) Cone tip resistance in sand: modeling, verification, and applications. Can Geotech J 42:977–993. https://doi.org/10.1139/t05-030
Ahsan M, Mahmud M, Saha P et al (2021) Effect of data scaling methods on machine learning algorithms and model performance. Technologies (basel) 9:52. https://doi.org/10.3390/technologies9030052
Akritas MG, Van Keilegom I (2001) Non-parametric estimation of the residual distribution. Scand J Stat 28:549–567. https://doi.org/10.1111/1467-9469.00254
Alkroosh IS, Bahadori M, Nikraz H, Bahadori A (2015) Regressive approach for predicting bearing capacity of bored piles from cone penetration test data. J Rock Mech Geotech Eng 7:584–592. https://doi.org/10.1016/j.jrmge.2015.06.011
Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9:1545–1588
Baddeley A, Turner R, Møller J, Hazelton M (2005) Residual analysis for spatial point processes (with discussion). J R Stat Soc Series B Stat Methodol 67:617–666. https://doi.org/10.1111/j.1467-9868.2005.00519.x
Baghbani A, Choudhury T, Costa S, Reiner J (2022) Application of artificial intelligence in geotechnical engineering: A state-of-the-art review. Earth Sci Rev 228:103991
Bayat HNMRHMAMAA and DMR (2008) Comparing neural networks, linear and nonlinear regression techniques to model penetration resistance. Turk J Agric for 32:425–433
Bello O, Holzmann J, Yaqoob T, Teodoriu C (2015) Application of artificial intelligence methods in drilling system design and operations: a review of the state of the art. J Artif Intell Soft Comput Res 5:121–139. https://doi.org/10.1515/jaiscr-2015-0024
Berrar D (2019) Cross-Validation. Encyclopedia of bioinformatics and computational biology. Elsevier, pp 542–545
Bhanja S, Das A (2018) Impact of data normalization on deep neural network for time series forecasting
Bol E, Önalp A, Özocak A, Sert S (2019) Estimation of the undrained shear strength of Adapazari fine grained soils by cone penetration test. Eng Geol 261:105277. https://doi.org/10.1016/j.enggeo.2019.105277
Borchani H, Varando G, Bielza C, Larrañaga P (2015) A survey on multi-output regression. Wiley Interdiscip Rev Data Min Knowl Discov 5:216–233. https://doi.org/10.1002/widm.1157
Borgognone MG, Bussi J, Hough G (2001) Principal component analysis in sensory analysis: covariance or correlation matrix? Food Qual Prefer. https://doi.org/10.1016/S0950-3293(01)00017-9
Botchkarev A (2019) A new typology design of performance metrics to measure errors in machine learning regression algorithms. IJIKM. 14:45–76. https://doi.org/10.28945/4184
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140. https://doi.org/10.1007/BF00058655
Breiman L (2001) Random forests. Mach Learn Random Forests. https://doi.org/10.1201/9780429469275-8
Buchanan BG (2005) A (Very) brief history of artificial intelligence. AI Mag. https://doi.org/10.1609/aimag.v26i4.1848
Burn R (2020) Optimizing Approximate Leave-one-out Cross-validation to Tune Hyperparameters
Carneiro T, da Nobrega RVM, Nepomuceno T et al (2018) performance analysis of google colaboratory as a tool for accelerating deep learning applications. IEEE Access 6:61677–61685. https://doi.org/10.1109/ACCESS.2018.2874767
Cerda P, Varoquaux G (2020) Encoding high-cardinality string categorical variables. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2992529
Chen X, Jeong JC (2007) Enhanced recursive feature elimination. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007). IEEE, pp 429–435
Chollet F& others (2015) Keras
Cios KJ (2018) Deep neural networks—A brief history. Studies in Computational Intelligence. Springer, Cham, pp 183–200
Darst BF, Malecki KC, Engelman CD (2018) Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet 19:1–6. https://doi.org/10.1186/s12863-018-0633-8
Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:3. https://doi.org/10.1186/1471-2105-7-3
Dobbin KK, Simon RM (2011) Optimally splitting cases for training and testing high dimensional classifiers. BMC Med Genomics. https://doi.org/10.1186/1755-8794-4-31
Donos C, Dümpelmann M, Schulze-Bonhage A (2015) Early seizure detection algorithm based on intracranial eeg and random forest classification. Int J Neural Syst. https://doi.org/10.1142/S0129065715500239
Douglas BJ, Olsen RS (1981) Soil classification using electric cone penetrometer. Proceedings of conference on cone penetration testing and experience, St Louis 209–227
Du J, Xu Y (2017) Hierarchical deep neural network for multivariate regression. Pattern Recognit 63:149–157. https://doi.org/10.1016/j.patcog.2016.10.003
Emerson RW (2015) Causation and pearson’s correlation coefficient. J vis Impair Blind 109:242–244. https://doi.org/10.1177/0145482X1510900311
Erzin Y, Ecemis N (2017) The use of neural networks for the prediction of cone penetration resistance of silty sands. Neural Comput Appl 28:727–736. https://doi.org/10.1007/s00521-016-2371-z
Fawagreh K, Gaber MM, Elyan E (2014) Random forests: from early developments to recent advancements. Syst Sci Control Eng 2:602–609. https://doi.org/10.1080/21642583.2014.956265
Goldstein BA, Polley EC, Briggs FBS (2011) Random forests for genetic association studies. Stat Appl Genet Mol Biol. https://doi.org/10.2202/1544-6115.1691
Gong G (1986) Cross-validation, the jackknife, and the bootstrap: excess error estimation in forward logistic regression. J Am Stat Assoc. https://doi.org/10.1080/01621459.1986.10478245
Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chem Intell Lab Syst. https://doi.org/10.1016/j.chemolab.2006.01.007
Haenlein M, Kaplan A (2019) A brief history of artificial intelligence: on the past, present, and future of artificial intelligence. Calif Manage Rev 61:5–14. https://doi.org/10.1177/0008125619864925
Hariharan R (2021) Random forest regression analysis on combined role of meteorological indicators in disease dissemination in an Indian city: a case study of New Delhi. Urban Clim 36:100780. https://doi.org/10.1016/j.uclim.2021.100780
Harris CR, Millman KJ, van der Walt SJ et al (2020) Array programming with {NumPy}. Nature 585:357–362. https://doi.org/10.1038/s41586-020-2649-2
Harrison JW, Lucius MA, Farrell JL et al (2021) Prediction of stream nitrogen and phosphorus concentrations from high-frequency sensors using random forests regression. Sci Total Environ 763:143005. https://doi.org/10.1016/j.scitotenv.2020.143005
Hawkins DM (2004) The problem of overfitting. J Chem Inf Comput Sci 44:1–12. https://doi.org/10.1021/ci0342472
Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice-Hall Inc, New Jersey
Hinton GE, Osindero S (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844. https://doi.org/10.1109/34.709601
Huang L, Qin J, Zhou Y et al (2023) Normalization techniques in training DNNs: methodology analysis and application. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3250241
Jarushi F, Alkaabim S, Cosentino P (2015) A new correlation between SPT and CPT for various soils. Int J Environ Ecol Geol Geophys Eng 9:101–107
Kaltenecker C, Grebhahn A, Siegmund N, Apel S (2020) The interplay of sampling and machine learning for software performance prediction. IEEE Softw 37:58–66. https://doi.org/10.1109/MS.2020.2987024
Khaledian Y, Miller BA (2020) Selecting appropriate machine learning methods for digital soil mapping. Appl Math Model 81:401–418. https://doi.org/10.1016/j.apm.2019.12.016
Kodikara GRL, Mchenry LJ (2020) Machine learning approaches for classifying lunar soils. Icarus 345:113719. https://doi.org/10.1016/j.icarus.2020.113719
Koul A, Becchio C, Cavallo A (2018) Cross-validation approaches for replicability in psychology. Front Psychol. https://doi.org/10.3389/fpsyg.2018.01117
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Kurup PU, Griffin EP (2006) Prediction of soil composition from CPT data using general regression neural network. J Comput Civ Eng 20:281–289. https://doi.org/10.1061/(asce)0887-3801(2006)20:4(281)
Lai L, Suda N, Chandra V (2018) CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs
LaValle SM, Branicky MS, Lindemann SR (2004) On the Relationship between classical grid search and probabilistic roadmaps. Int J Rob Res. https://doi.org/10.1177/0278364904045481
Lim YX, Tan SA, Phoon KK (2020) Friction angle and overconsolidation ratio of soft clays from cone penetration test. Eng Geol 274:105730. https://doi.org/10.1016/j.enggeo.2020.105730
Liu S, Zou H, Cai G et al (2016) Multivariate correlation among resilient modulus and cone penetration test parameters of cohesive subgrade soils. Eng Geol 209:128–142. https://doi.org/10.1016/j.enggeo.2016.05.018
Liu Z, Zhang A (2020) A Survey on Sampling and Profiling over Big Data (Technical Report)
McKinney W (2010) Data Structures for Statistical Computing in Python. Proc 9th Python Sci Conf. 445(1):51–56
Mittal M, Satapathy SC, Pal V et al (2021) Prediction of coefficient of consolidation in soil using machine learning techniques. Microprocess Microsyst 82:103830. https://doi.org/10.1016/j.micpro.2021.103830
Moayedi H, Osouli A, Nguyen H, Rashid ASA (2019) A novel Harris hawks’ optimization and k-fold cross-validation predicting slope stability. Eng Comput. https://doi.org/10.1007/s00366-019-00828-8
MolaAbasi H, Saberian M, Li J (2019) Prediction of compressive and tensile strengths of zeolite-cemented sand using porosity and composition. Constr Build Mater 202:784–795. https://doi.org/10.1016/j.conbuildmat.2019.01.065
Morrow AK, He GZ, Nothaft FA et al (2019) Mango: exploratory data analysis for large-scale sequencing datasets. Cell Syst 9:609-613.e3. https://doi.org/10.1016/j.cels.2019.11.002
Nawi NM, Atomi WH, Rehman MZ (2013) The effect of data pre-processing on optimized training of artificial neural networks. Procedia Technol 11:32–39. https://doi.org/10.1016/j.protcy.2013.12.159
Oberhollenzer S, Premstaller M, Marte R et al (2021) Cone penetration test dataset Premstaller Geotechnik. Data Brief 34:106618. https://doi.org/10.1016/j.dib.2020.106618
Osborne JW, Overbay A (2019) The power of outliers (and why researchers should ALWAYS check for them). Pract Assess Res Eval 9:1–8
Pacheco VL, Bragagnolo L, Thomé A (2021) Artificial neural networks applied for solidified soils data prediction: a bibliometric and systematic review. Eng Comput (swansea) 38:3104–3131. https://doi.org/10.1108/EC-10-2020-0576
Parzinger M, Hanfstaengl L, Sigg F et al (2020) Residual analysis of predictive modelling data for automated fault detection in building’s heating, ventilation and air conditioning systems. Sustainability (switzerland). https://doi.org/10.3390/SU12176758
Payen FT, Sykes A, Aitkenhead M et al (2021) Predicting the abatement rates of soil organic carbon sequestration management in Western European vineyards using random forest regression. Clean Environ Syst 2:100024. https://doi.org/10.1016/j.cesys.2021.100024
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in {P}ython. J Mach Learn Res 12:2825–2830
Peng Y, Nagata MH (2020) An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data. Chaos Solitons Fract 139:110055. https://doi.org/10.1016/j.chaos.2020.110055
Pérez-Rave JI, Correa-Morales JC, González-Echavarría F (2019) A machine learning approach to big data regression analysis of real estate prices for inferential and predictive purposes. J Prop Res 36:59–96. https://doi.org/10.1080/09599916.2019.1587489
Pham TA, Tran VQ, Vu HLT, Ly HB (2020) Design deep neural network architecture using a genetic algorithm for estimation of pile bearing capacity. PLoS ONE 15:1–25. https://doi.org/10.1371/journal.pone.0243030
Pooya Nejad F, Jaksa MB (2017) Load-settlement behavior modeling of single piles using artificial neural networks and CPT data. Comput Geotech 89:9–21. https://doi.org/10.1016/j.compgeo.2017.04.003
Qureshi AS, Khan A, Zameer A, Usman A (2017) Wind power prediction using deep neural network based meta regression and transfer learning. Appl Soft Comput J 58:742–755. https://doi.org/10.1016/j.asoc.2017.05.031
Rácz A, Bajusz D, Héberger K (2021) Effect of dataset size and train/test split ratios in qsar/qspr multiclass classification. Molecules. https://doi.org/10.3390/molecules26041111
Rauter S, Tschuchnigg F (2021) Cpt data interpretation employing different machine learning techniques. Geosciences (switzerland). https://doi.org/10.3390/geosciences11070265
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85:333–359
Reale C, Gavin K, Librić L, Jurić-Kaćunić D (2018) Automatic classification of fine-grained soils using CPT measurements and artificial neural networks. Adv Eng Inform 36:207–215. https://doi.org/10.1016/j.aei.2018.04.003
Robertson PK (1990) Soil classification using the cone penetration test. Can Geotech J 27:151–158. https://doi.org/10.1139/t90-014
Robertson PK (2009) Interpretation of cone penetration tests––a unified approach. Can Geotech J 46:1337–1355. https://doi.org/10.1139/T09-065
Robertson PK (2016) Cone penetration test (CPT)-based soil behaviour type (SBT) classification system—An update. Can Geotech J 53:1910–1927. https://doi.org/10.1139/cgj-2016-0044
Robertson PK (2005) Soil Behavior Type using the DMT. In: 3rd International Conference on the Flat Dilatometer
Rodríguez JD, Pérez A, Lozano JA (2010) Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32:569–575. https://doi.org/10.1109/TPAMI.2009.187
Saeb S, Lonini L, Jayaraman A et al (2017) The need to approximate the use-case in clinical machine learning. Gigascience. https://doi.org/10.1093/gigascience/gix019
Santos Nobre J, da Motta SJ (2007) Residual analysis for linear mixed models. Biom J 49:863–875. https://doi.org/10.1002/bimj.200610341
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2:160. https://doi.org/10.1007/s42979-021-00592-x
Shekar BH, Dagnew G (2019) Grid Search-Based Hyperparameter Tuning and Classification of Microarray Cancer Data. In: 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP). IEEE
Sola J, Sevilla J (1997) Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Trans Nucl Sci 44:1464–1468. https://doi.org/10.1109/23.589532
Steurer M, Hill RJ, Pfeifer N (2021) Metrics for evaluating the performance of machine learning based automated valuation models. J Prop Res 00:1–31. https://doi.org/10.1080/09599916.2020.1858937
Strubell E, Ganesh A, McCallum A (2020) Energy and policy considerations for modern deep learning research. Proc AAAI Conf Artif Intell 34:13693–13696. https://doi.org/10.1609/aaai.v34i09.7123
Tarawneh B (2017) Predicting standard penetration test N-value from cone penetration test data using artificial neural networks. Geosci Front 8:199–204. https://doi.org/10.1016/j.gsf.2016.02.003
Tseng MM, Wang Y, Jiao RJ (2017) Mass Customization. CIRP Encyclopedia of Production Engineering. Springer, Berlin, pp 1–8
Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on CPUs
Wallace MI, Ng KC (2016) Development and application of underground space use in Hong Kong. Tunn Undergr Space Technol 55:257–279. https://doi.org/10.1016/j.tust.2015.11.024
Wang X, Liu A, Kara S (2022) Machine learning for engineering design toward smart customization: a systematic review. J Manuf Syst 65:391–405
Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning. J Big Data. https://doi.org/10.1186/s40537-016-0043-6
Wiesmeier M, Barthold F, Blank B, Kögel-Knabner I (2011) Digital mapping of soil organic matter stocks using random forest modeling in a semi-arid steppe ecosystem. Plant Soil. https://doi.org/10.1007/s11104-010-0425-z
Wong T-T (2015) Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognit 48:2839–2846. https://doi.org/10.1016/j.patcog.2015.03.009
Xu D, Shi Y, Tsang IW et al (2019) Survey on multi-output learning. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2945133
Xue L, Liu Y, Xiong Y et al (2021) A data-driven shale gas production forecasting method based on the multi-objective random forest regression. J Pet Sci Eng 196:107801. https://doi.org/10.1016/j.petrol.2020.107801
Yadav S, Shukla S (2016) Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification. In: Proceedings–6th International Advanced Computing Conference, IACC 2016. Institute of Electrical and Electronics Engineers Inc., pp 78–83
Yan K, Zhang D (2015) Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sens Actuators B Chem 212:353–363. https://doi.org/10.1016/j.snb.2015.02.025
Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316. https://doi.org/10.1016/j.neucom.2020.07.061
Yin Z, Zhang J (2014) Operator functional state classification using least-square support vector machine based recursive feature elimination technique. Comput Methods Programs Biomed 113:101–115. https://doi.org/10.1016/j.cmpb.2013.09.007
Ying X (2019) An overview of overfitting and its solutions. J Phys Conf Ser 1168:022022. https://doi.org/10.1088/1742-6596/1168/2/022022
You W, Yang Z, Ji G (2014) Feature selection for high-dimensional multi-category data using PLS-based local recursive feature elimination. Expert Syst Appl 41:1463–1475. https://doi.org/10.1016/j.eswa.2013.08.043
Zemouri R, Omri N, Fnaiech F et al (2020) A new growing pruning deep learning neural network algorithm (GP-DLNN). Neural Comput Appl. https://doi.org/10.1007/s00521-019-04196-8
Zhang L, Gove JH, Heath LS (2005) Spatial residual analysis of six modeling techniques. Ecol Modell 186:154–177. https://doi.org/10.1016/j.ecolmodel.2005.01.007
Zhuang F, Qi Z, Duan K et al (2021) A comprehensive survey on transfer learning. Proc IEEE 109:43–76. https://doi.org/10.1109/JPROC.2020.3004555
Funding
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001 and by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq–Grants #312756/2017-8 and #314643/2020-6).
Author information
Authors and Affiliations
Contributions
VLP, LB, FDR and AT designed the content and logic of this experimental feature and algorithms. VLP and LB finished the first-hand manuscript, also FDR and AT revised this manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pacheco, V.L., Bragagnolo, L., Dalla Rosa, F. et al. Cone Penetration Test Prediction Based on Random Forest Models and Deep Neural Networks. Geotech Geol Eng 41, 4595–4628 (2023). https://doi.org/10.1007/s10706-023-02535-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10706-023-02535-0